Re: An appropriate directory search tool?
On Mon 22 Oct 2018 at 09:09:12 (-0400), Greg Wooledge wrote:
> On Sun, Oct 21, 2018 at 08:48:28AM -0500, David Wright wrote:
> > On Sun 21 Oct 2018 at 05:25:05 (-0500), Richard Owlett wrote:
> > > I wish a list of files with a specific extension in a directory which
> > > contain keywordA but not keywordB. Recursing down the directory tree
> > > was the primary objection to the MATE search tool.
↑↑↑↑↑↑↑↑↑
> >
> > At last, a direct question!
> >
> > $ grep -L keywordB $(grep -l keywordA a-directory/*extension)
> >
> > Mix with quotes according to taste and needs.
>
> That doesn't recurse (it only considers files at depth 1 in a single
> subdirectory),
Specifically required by the OP.
> and it falls apart on filenames with whitespace.
Left as an exercise for the reader.
> If we ignore the recursion part for a moment, I have a FAQ for the
> "match A but not B" part:
>
> https://mywiki.wooledge.org/BashFAQ/079
>
> The specific example for this case (foo but NOT bar) is at the bottom:
>
> awk '/foo/{good=1} /bar/{good=0;exit} END{exit !good}'
>
> So, all we have to do is write the recursion and extension-filtering
> parts and link them together with the awk command. This is fairly
> straightforward with the standard tools.
>
> find . -type f -name '*.myext' -exec \
> awk '/keywordA/{good=1} /keywordB/{good=0;exit} END{exit !good}' {} \; -print
>
>
> Testing:
>
> wooledg:~$ mkdir /tmp/x && cd "$_"
> wooledg:/tmp/x$ mkdir -p a/b/c a/b/d
> wooledg:/tmp/x$ echo keywordA > a/b/c/good.myext
> wooledg:/tmp/x$ echo keywordA keywordB > a/b/d/bad.myext
> wooledg:/tmp/x$ find . -type f -name '*.myext' -exec \
> > awk '/keywordA/{good=1} /keywordB/{good=0;exit} END{exit !good}' {} \; -print
> ./a/b/c/good.myext
>
>
> Now, the obvious unstated part of the question is that he will want
> keywordA and keywordB to be passed as parameters (although knowing him,
> he will require 17 messages to tell us this).
>
> This is where it actually gets "hard", because the obvious thing to do
> would be to change the quotes on the awk command and embed $1 and $2 in
> it directly. That is a TRAP. It's a code injection bug, because the
> parameters given by the user could contain code that is meaningful to awk,
> which would lead to unexpected results.
>
> For that part of the program, I refer you to:
>
> https://mywiki.wooledge.org/BashProgramming/05
>
> I would use the "awk variables" approach for this one:
>
> #!/bin/sh
> if test "$#" != 2; then
> printf "usage: %s goodpat badpat\n" "$0" >&2
> exit 1
> fi
>
> find . -type f -name '*.myext' -exec \
> awk -v goodpat="$1" -v badpat="$2" \
> '$0 ~ goodpat {good=1} $0 ~ badpat {good=0;exit} END{exit !good}' {} \; \
> -print
>
>
> And, testing:
>
> wooledg:/tmp/x$ set -- wordA wordB
> wooledg:/tmp/x$ find . -type f -name '*.myext' -exec \
> > awk -v goodpat="$1" -v badpat="$2" \
> > '$0 ~ goodpat {good=1} $0 ~ badpat {good=0;exit} END{exit !good}' {} \; \
> > -print
> ./a/b/c/good.myext
>
>
> And then, the obvious next extension after THAT would be to make the
> filename extension a parameter. The shell part of that one is super
> easy (no code injection problems with find -name), so I won't bother
> showing it.
>
> At that point, the user interface becomes the real issue. Do you
> put the extension argument first, or last? Do you make it an option?
> Do you hardcode a default extension, or does the lack of a specified
> extension mean that you drop the -name filter altogether? Or do you
> give up the command line interface entirely, and go with a Tk dialog?
>
> But he'll never, ever, EVER be able to answer those questions, so we
> won't have to worry about it.
No, but we all learn something from these posts; at least, I do.
Cheers,
David.
Reply to: