Re: one liner, how do you know which match happened ...

To: debian-user@lists.debian.org
Subject: Re: one liner, how do you know which match happened ...
From: davidson <davidson@freevolt.org>
Date: Sat, 20 Jun 2020 21:01:42 +0000 (UTC)
Message-id: <[🔎] alpine.DEB.2.21.2006202100410.14226@azone.org>
In-reply-to: <[🔎] CAFakBwjfPkyo01JwnutCjw5OCR1BPztFKMYHPZYXnumA=3y81Q@mail.gmail.com>
References: <[🔎] CAFakBwjfPkyo01JwnutCjw5OCR1BPztFKMYHPZYXnumA=3y81Q@mail.gmail.com>


On Sat, 20 Jun 2020, Albretch Mueller wrote:

_X=".\(html\|txt\)"
_SDIR="$(pwd)"

_AR_TERMS=(
Kant
"Gilbert Ryle"
Hegel
)

for iZ in ${!_AR_TERMS[@]}; do
find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
"${_AR_TERMS[$iZ]}" {} \;
done # iZ: terms search/grep'ped inside text files;  echo "~";


# this would be much faster

find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
"Kant\|Gilbert Ryle\|Hegel" {} \;


There is also

  -exec '{}' +

instead of the -exec '{}' ';' version. You could compare them.

There was a recent thread here about this?

but how do I know which match happened in order to save it into
separate files?


Remove -l flag, use -o flag.

 $ man grep # Read it now. Read it later too.
 ...
 -l, --files-with-matches
       Suppress normal output; instead print the name of each input
       file from which output would normally have been printed.  The
       scanning will stop on the first match.
 ...
  -o, --only-matching

       Print only the matched (non-empty) parts of a matching line,
       with each such part on a separate output line.
 ...
 -n, --line-number

       Prefix each line of output with the 1-based line number within
       its input file.
 ...
 -Z, --null
       Output a zero byte (the ASCII NUL character) instead of the
       character that normally follows a file name.  For example, grep
       -lZ outputs a zero byte after each file name instead of the
       usual newline.  This option makes the output unambiguous, even
       in the presence of file names containing unusual characters
       like newlines.  This option can be used with commands like find
       -print0, perl -0, sort -z, and xargs -0 to process arbitrary
       file names, even those that contain newline characters.

grep doesn't do replacements:

https://stackoverflow.com/questions/16197406/grep-regex-replace-specific-find-in-text-file


No. Do you want replacements? You haven't asked for that.

but at least (in my way to understand reality, since it must try such
searches sequentially) it should give  you the index of the match


It can. See -n flag above.

$ find . -type f \
  \( -name "*.[Hh][Tt][Mm][Ll]" -o -name "[Tt][Xx][Tt]" \) \
  -exec grep -ino "Waldo\|Ice Cream"  '{}' + | uniq
[...]
./wikipedia.org/Transport_Layer_Security.html:999:Ice Cream
./wikipedia.org/Transport_Layer_Security.html:1623:Ice Cream
./wikipedia.org/Edward_R_Murrow.html:338:Waldo
./wikipedia.org/Firefox_version_history.html:3219:Ice Cream
./wikipedia.org/Firefox_version_history.html:3710:Ice Cream
./wikipedia.org/Firefox_version_history.html:6021:Ice Cream
./wikipedia.org/Rwandan_genocide.html:993:Waldo
./wikipedia.org/Agar.html:61:Ice cream
./wikipedia.org/Agar.html:61:ice cream
./wikipedia.org/Turmeric.html:341:ice cream
./wikipedia.org/There_Will_Be_Blood.html:150:ice cream
./wikipedia.org/Boko_haram.html:1328:ice cream
./wikipedia.org/Phantasm_(franchise).html:126:ice cream
./wikipedia.org/Phantasm_(franchise).html:226:ice cream
./wikipedia.org/Thundarr_the_Barbarian.html:153:Waldo
./wikipedia.org/Ecuador.html:1487:waldo
./wikipedia.org/French_Guiana.html:886:Ice cream
./wikipedia.org/French_Guiana.html:886:ice cream
./wikipedia.org/Henry_david_thoreau.html:136:Waldo
./wikipedia.org/Henry_david_thoreau.html:235:Waldo
[...]

and if grep doesn't do that I am sure some other batch utility would


"Batch utility" sounds like COBOL talk, or something else paleographic
and interesting.

(I havenever used sed in my code)


I believe sed is a write-only language that is overwhelmingly useful
at command line.

If I find myself putting it in a script, I try to find another way.
For example, awk is far nicer to look at.

--
@almightygenie 8 Jun 2020 | @Windex
  Thanks, Windex. That's a relief. Your drink is even more refreshing
  now that I know that it deplores racism & discrimination.
twitter.com/almightygenie/status/1270096054864809988

Reply to:

Follow-Ups:
- Re: one liner, how do you know which match happened ...
  - From: davidson <davidson@freevolt.org>

References:
- one liner, how do you know which match happened ...
  - From: Albretch Mueller <lbrtchx@gmail.com>

Prev by Date: Re: technical terms overhaul
Next by Date: Re: CLI interface to packages.debian.org
Previous by thread: Re: one liner, how do you know which match happened ...
Next by thread: Re: one liner, how do you know which match happened ...
Index(es):
- Date
- Thread