Re: one liner, how do you know which match happened ...

To: debian-user@lists.debian.org
Subject: Re: one liner, how do you know which match happened ...
From: <tomas@tuxteam.de>
Date: Sat, 20 Jun 2020 13:08:58 +0200
Message-id: <[🔎] 20200620110858.GD15896@tuxteam.de>
In-reply-to: <[🔎] CAFakBwjfPkyo01JwnutCjw5OCR1BPztFKMYHPZYXnumA=3y81Q@mail.gmail.com>
References: <[🔎] CAFakBwjfPkyo01JwnutCjw5OCR1BPztFKMYHPZYXnumA=3y81Q@mail.gmail.com>

On Sat, Jun 20, 2020 at 12:20:41PM +0200, Albretch Mueller wrote:
> _X=".\(html\|txt\)"
> _SDIR="$(pwd)"
> 
> _AR_TERMS=(
> Kant
> "Gilbert Ryle"
> Hegel
> )
> 
> for iZ in ${!_AR_TERMS[@]}; do
>  find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
> "${_AR_TERMS[$iZ]}" {} \;
> done # iZ: terms search/grep'ped inside text files;  echo "~";
> 
> 
> # this would be much faster
> 
> find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
> "Kant\|Gilbert Ryle\|Hegel" {} \;
> 
> but how do I know which match happened in order to save it into separate files?

Hm. The first approach goes three times through your files, once for
each term. The second goes once, for a combined regular expression.

So no wonder the second approach is faster.

But to actually attack the problem you should be aware that the
second method is doing *something different* from the first one:
"grep -l" will stop at the first hit, so even if you could ask
grep which one of the alternatives it found, it'll miss Hegel
in a file where Kant figures first. Is that what you want?

Once you have answered that question, you'll be able to proceed.
One possibility is postprocessing your output: grep outputs the
hit line, and you can match that against the individual terms;
you'd have to drop the "-l" for that, making things somewhat slower.

Another possibility is to keep the "-l" and to re-grep the files
found against all the individual patterns.

Cheers
-- t

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: one liner, how do you know which match happened ...
  - From: Albretch Mueller <lbrtchx@gmail.com>

References:
- one liner, how do you know which match happened ...
  - From: Albretch Mueller <lbrtchx@gmail.com>

Prev by Date: one liner, how do you know which match happened ...
Next by Date: Re: CLI interface to packages.debian.org
Previous by thread: one liner, how do you know which match happened ...
Next by thread: Re: one liner, how do you know which match happened ...
Index(es):
- Date
- Thread