[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: grep: show matching line from pattern file



On Sat, May 28, 2022 at 04:02:39PM -0400, The Wanderer wrote:
> On 2022-05-28 at 15:40, Jim Popovitch wrote:
> > I have a file of regex patterns and I use grep like so:
> > 
> >    ~$ grep -f patterns.txt /var/log/syslog 
> > 
> > What I'd like to get is a listing of all lines, specifically the line
> > numbers of the regexps in patterns.txt, that match entries in
> > /var/log/syslog.   Is there a way to do this?
> 
> I don't know of a standardized way to do that (if anyone else wants to
> suggest one, I'm open to learn), but of course it *can* be done, via
> scripting. Off the top of my head, I came up with the following
> 
> for line in $(seq 1 $(wc -l patterns.txt | cut -d ' ' -f 1)) ; do
>   if grep $(head -n $line patterns.txt | tail -n 1) /var/log/syslog >
> /dev/null ; then
>     echo $line ;
>   fi
> done

The quoting here is... completely absent (and that's extremely bad), but
also importantly, one would ideally like to avoid running grep a thousand
times, especially if the target logfile is large.

I believe this is the kind of job for which perl is well-suited.  I'm not
great at perl, but I'll give it a shot.

Here's a version with some extra information as output, so I can verify
that it's doing something reasonably close to correct:


#!/usr/bin/perl
use strict; use warnings;

my @patlist;
open PATS, "<patterns.txt" || die "can't open patterns.txt";
chomp(@patlist = <PATS>);
close PATS;

while (<>) {
    chomp;
    for (my $i = 0; $i <= $#patlist; $i++) {
	print "$i|$patlist[$i]|$_\n" if /$patlist[$i]/;
    }
}


Now, to test it, we need a patterns.txt file:


unicorn:~$ cat patterns.txt 
PATH
HOME|~
a...e


And an input (log) file:


unicorn:~$ cat file
zebra
Home, home on the range.
Oops, I meant HOME on the range.

applesauce


And here's what it does:


unicorn:~$ ./foo file
1|HOME|~|Oops, I meant HOME on the range.
2|a...e|applesauce


Pattern numbers 1 and 2 (the second and third, since it starts at 0) were
matched, so we have a line for each of those.

If that's kinda what you wanted, then you can adjust this to do precisely
what you wanted.  It shouldn't take a lot of work, I hope.  Well, I guess
that depends on what you really want.

Bash is not well-suited to this task, and even if we were to take The
Wanderer's script and fix all the issues in it, it would still be a
vastly inferior solution.  Some tools are just not meant for some jobs.


Reply to: