Re: one liner, how do you know which match happened ...
On Sat, 20 Jun 2020, Albretch Mueller wrote:
_X=".\(html\|txt\)"
_SDIR="$(pwd)"
_AR_TERMS=(
Kant
"Gilbert Ryle"
Hegel
)
for iZ in ${!_AR_TERMS[@]}; do
find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
"${_AR_TERMS[$iZ]}" {} \;
done # iZ: terms search/grep'ped inside text files; echo "~";
# this would be much faster
find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
"Kant\|Gilbert Ryle\|Hegel" {} \;
There is also
-exec '{}' +
instead of the -exec '{}' ';' version. You could compare them.
There was a recent thread here about this?
but how do I know which match happened in order to save it into
separate files?
Remove -l flag, use -o flag.
$ man grep # Read it now. Read it later too.
...
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match.
...
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
...
-n, --line-number
Prefix each line of output with the 1-based line number within
its input file.
...
-Z, --null
Output a zero byte (the ASCII NUL character) instead of the
character that normally follows a file name. For example, grep
-lZ outputs a zero byte after each file name instead of the
usual newline. This option makes the output unambiguous, even
in the presence of file names containing unusual characters
like newlines. This option can be used with commands like find
-print0, perl -0, sort -z, and xargs -0 to process arbitrary
file names, even those that contain newline characters.
grep doesn't do replacements:
https://stackoverflow.com/questions/16197406/grep-regex-replace-specific-find-in-text-file
No. Do you want replacements? You haven't asked for that.
but at least (in my way to understand reality, since it must try such
searches sequentially) it should give you the index of the match
It can. See -n flag above.
$ find . -type f \
\( -name "*.[Hh][Tt][Mm][Ll]" -o -name "[Tt][Xx][Tt]" \) \
-exec grep -ino "Waldo\|Ice Cream" '{}' + | uniq
[...]
./wikipedia.org/Transport_Layer_Security.html:999:Ice Cream
./wikipedia.org/Transport_Layer_Security.html:1623:Ice Cream
./wikipedia.org/Edward_R_Murrow.html:338:Waldo
./wikipedia.org/Firefox_version_history.html:3219:Ice Cream
./wikipedia.org/Firefox_version_history.html:3710:Ice Cream
./wikipedia.org/Firefox_version_history.html:6021:Ice Cream
./wikipedia.org/Rwandan_genocide.html:993:Waldo
./wikipedia.org/Agar.html:61:Ice cream
./wikipedia.org/Agar.html:61:ice cream
./wikipedia.org/Turmeric.html:341:ice cream
./wikipedia.org/There_Will_Be_Blood.html:150:ice cream
./wikipedia.org/Boko_haram.html:1328:ice cream
./wikipedia.org/Phantasm_(franchise).html:126:ice cream
./wikipedia.org/Phantasm_(franchise).html:226:ice cream
./wikipedia.org/Thundarr_the_Barbarian.html:153:Waldo
./wikipedia.org/Ecuador.html:1487:waldo
./wikipedia.org/French_Guiana.html:886:Ice cream
./wikipedia.org/French_Guiana.html:886:ice cream
./wikipedia.org/Henry_david_thoreau.html:136:Waldo
./wikipedia.org/Henry_david_thoreau.html:235:Waldo
[...]
and if grep doesn't do that I am sure some other batch utility would
"Batch utility" sounds like COBOL talk, or something else paleographic
and interesting.
(I havenever used sed in my code)
I believe sed is a write-only language that is overwhelmingly useful
at command line.
If I find myself putting it in a script, I try to find another way.
For example, awk is far nicer to look at.
--
@almightygenie 8 Jun 2020 | @Windex
Thanks, Windex. That's a relief. Your drink is even more refreshing
now that I know that it deplores racism & discrimination.
twitter.com/almightygenie/status/1270096054864809988
Reply to: