Thursday, September 24, 2009

Fun with find

The find command is incredibly useful, if a bit arcane, so it's a shame that more Linux users aren't aware of it. Basically, if what you're trying to do sounds like "search for files with these characteristics, and do something to them", then find is probably the tool you're looking for.

Let's start with the very basics. If you run find with no arguments, it'll start listing all the files it can find. This isn't especially useful behavior, but let's take a closer look at it. find takes a list of paths, and looks at all files contained in each of them - if you don't give it any, it'll assume the current directory. We can introduce a bit of bash-fu to start doing something that looks like it might be useful:

$ find ${PATH//:/ }

This will list all the programs that are available on your path. (The weird-looking variable reference just replaces colons with spaces - look here if you're curious.)

Just listing files is no fun, though. You can do that already, with ls, which also has the advantage of being a few less letters to type! So let's start getting into the real power of find, with expressions.

After the paths to search on, you can specify any number of expressions - basically, filters that look at the list of files and only select the ones matching some criteria. One simple one is -name:

$ find /usr/portage -name ChangeLog | wc -l

(If the bar thing looks funny to you, you need to read up on pipes. If you don't know pipes, you can't really say you know how to use the command line, they're that important.) This is a command I used just a few hours ago, to find out how many ChangeLog files there are in Gentoo's portage tree. Without the find command, this would have been kind of a pain. find also has a -iname filter, that does a case-insensitive match - useful if you're looking for files that have inconsistent capitalization.

There are a lot of other possible filters, too many to list here, so you'll have to read the find man page to see them all. Here are just a few examples:

$ find ${PATH//:/ } -name "mkfs.*"
This is the earlier example, but with a twist - this prints out the full path to programs matching a given pattern. (If you only want one program, the which command is easier, though.)

$ find ~ -empty
This lists all empty (zero-length) files in your home directory.

$ find / -user root
This will list all files owned by root. (You probably have to be root for this to actually list all of them, for obvious reasons.)

$ find / -size +500M
This finds all files on your system larger than 500 megabytes, and requires some explanation. Filters that take numerical arguments can usually also take a + or - modifier, to mean "greater than this" or "less than this". If you leave it out, then you can search for files that have some exact size.

$ find ~ -mmin -30
List all the files in your home directory that were modified in the past 30 minutes. (No more wondering about where you saved that important file!)

$ find /usr/bin -not -executable
There shouldn't be any non-executable files there, but I found one on my system - probably a bug in the package that installed that file. (Want more logical operators? You can stick a -or between two filters and find will return the file if it matches either of them.)

"But wait," you might be thinking. "You said find would look for files and let me do stuff to them, but listing them isn't terribly interesting!" Don't worry, the fun is just beginning. :D

The simplest way to get find to do stuff with files is not to use find at all: pipe the output to xargs instead. For most simple tasks, this is way easier than using find's execution capabilities. The following three commands do basically the same thing:

$ find ~ -size 0 | xargs rm
$ find ~ -size 0 -exec rm "{}" +
$ find ~ -size 0 -delete

The first one just pipes the list of files to xargs, which is a nifty little utility that runs the command it's given on each filename it gets through the pipe. In this case, it runs rm and deletes all the files it's passed, but you could use any command there.

The second one uses find's -exec option, which gives you more control over how the command is constructed. After the -exec, you find rm, which is pretty self explanatory - it's the command you're executing. The "{}" thing is find's weird way of saying "the file that was found" - this is where the filename gets substituted into the command. The + ends the command, but there's a twist here. If you end the command with a semicolon instead, find runs the command once for each file. (NB: you have to put the semicolon in quotes or bash messes with it. This took me forever to figure out :( ) If you use +, on the other hand, it has the same effect as far as terminating the command, but it also tells find to jam as many filenames as it can in there, subject to whatever limitations the OS imposes. For large file lists, this can be the difference between your command running thousands of times or just a few times, so using + wherever possible is a good habit to get into.

The third is mainly for completeness - find has a builtin function for deleting files, making this example a bit pointless. :)

Here are a few more practical examples.

$ find /usr/portage -name ChangeLog -exec du -c "{}" + | grep total
I used this to find the total disk space on my system taken up by ChangeLog files in the portage tree. "du -c" will print out the total disk space used by all the files you give it, and the grep filters the output down to just those totals.

$ find -type d -exec chmod 755 "{}" +
Somehow I had a pile of directories on my NFS share that had no execute permissions for all users, so other users couldn't even enter those directories. This fixed all that in a single command.

$ find ${PATH//:/ } -perm -4111 -user root
Shows you all binaries available on your path that are suid root. These can be serious security risks if the programs are written insecurely.

$ find -mtime +365 -exec mv "{}" archive/ +
Moves all files that haven't been modified in more than a year to another directory.

$ find -nouser -exec chown root "{}" + , -nogroup -exec chgrp root "{}" +
Find all files that are owned by a nonexistent user or group, and change that ownership to root. Note the comma in there; it splits up the expression so that you can operate on multiple sets of files in a single find command, and only have to actually scan the directory tree once. If you want to do something like this in the absolute fastest way, find is your friend.

That's about the limit of my knowledge, but the find man page has loads more information, as well as some more examples.

No comments: