Wednesday, April 13, 2011

grouping predicates in find

This part " ( -name *txt -o -name *html ) " confuses me in the code:

find $HOME \( -name \*txt -o -name \*html \) -print0 | xargs -0 grep -li vpn

Can someone explain the the brackets and "-o"? Is "-o" a command or a parameter? I know the brackets are escaped by "\" , but why are they for?

From stackoverflow
  • The "-o" means OR. I.e., name must end in "txt" or "html". The brackets just group the two conditions together.

  • The ( and ) provide a way to group search parameters for the find command. The -o is an "or" operator.

    This find command will find all files ending in "txt" or "html" and pass those as arguments to the grep command.

  • By default, the conditions in the find argument list are 'and'ed together. The -o option means 'or'.

    If you wrote:

    find $HOME -name \*txt -o -name \*html -print0
    

    then there is no output action associated with the file names end with 'txt', so they would not be printed. By grouping the name options with parentheses, you get both the 'html' and 'txt' files.


    Consider the example:

    mkdir test-find
    cd test-find
    cp /dev/null file.txt
    cp /dev/null file.html
    

    The comments below have an interesting side-light on this. If the command was:

    find . -name '*.txt' -o -name '*.html'
    

    then, since no explicit action is specified for either alternative, the default -print (not -print0!) action is used for both alternatives and both files are listed. With a -print or other explicit action after one of the alternatives (but not the other), then only the alternative with the action takes effect.

    find . -name '*.txt' -print -o -name '*.html'
    

    This also suggests that you could have different actions for the different alternatives. You could also apply other conditions, such as a modification time:

    find . \( -name '*.txt' -o -name '*.html' \) -mtime +5 -print0
    
    find . \( -name '*.txt'  -mtime +5 -o -name '*.html' \) -print0
    

    The first prints txt or html files older than 5 days (so it prints nothing for the example directory - the files are a few seconds old); the second prints txt files older than 5 days or html files of any age (so just file.html). And so on...

    Thanks to DevSolar for his comments leading to this addition.

    DevSolar : Erm... I see from your comment on my answer that the parentheses seem to be required on SUN, but it works excellently without them on both Linux (GNU find) and AIX (AIX find). Could you elaborate why exactly it would fail without them? According to 'man find', the parens are for precedence only...
    Jonathan Leffler : mkdir find-example; cd find-example; >file.txt; >file.html; find . -name '*.txt' -o -name '*.html' -print; -- on Solaris, both GNU and Solaris 'find' list just file.html.
    DevSolar : Heh... funny. I always omit the '-print' as it's the default action anyway. It turns out that, if no '-print' is given, *both* files are printed - only if you explicitly set '-print', that binds stronger with '-name \*html' (because it's an AND), and breaks the whole expression.
    Jonathan Leffler : Once upon a long time ago, print wasn't the default, so I usually use it. The '-print0' in the Q is not default. I confirm that w/o the '-print' you get both files list in my example - GNU and Solaris finds. Interesting sidelight - thanks, DevSolar.

0 comments:

Post a Comment