Re: quick scripting question - finding occurrence in many lines
On Wed, 2006-11-29 at 09:36 -0900, Ken Irving wrote:
> On Wed, Nov 29, 2006 at 02:32:37PM +0000, michael wrote:
> > I guess a complete rephrase is best.
> >
> > What I want is "how many processors does each WAITING job in lsf queues
> > require?". From 'bhist' I get outputs such as below (see whitespace
> > anywhere in "num Processors") and cannot determine a sure way of always
> > parsing it...
>
> In the brute force perl solution previously shown, just add whitespace
> to the character class, [\s\n-], which is inserted between every target
> character in the regular expression. This would be similar in awk, sed,
> grep, or other tool using regular expressions.
>
> #!/usr/bin/perl -w
> use strict;
> my $source = join '', <>; # get all the data into a string
> my $t = '[\s\n-]'; # define a regexp character class
> print "$1\n" while # to be between each character
> $source =~ m/(\d+)\s+P$t*r$t*o$t*c$t*e$t*s$t*s$t*o$t*r/msg;
>
> Other schemes previously shown would probably work with trivial changes,
> e.g., using tr to delete (-d) or squeeze (-s) runs of spaces or newlines,
> etc.
>
> Unless this is a one-off task (which it seems like it isn't), I'd
> suggest looking into fixing whatever is generating the screwed-up output
> in the first place. Failing that, use tr/sed/python/perl/ruby/BASIC
> whatever to filter the output to something more sensible, i.e., normalize
> it, and don't try to do it in one step.
>
> Ken
Getting rid of all white space brings own problems - as in 'bla 12
Processors' becomes one (no white space) string.
It's a sort of one-off but I can't fix the LSF queuing system. However,
here's a fix that works (given I know num of whitespace on 2nd, etc,
lines):
function getWAITinfo() {
echo $jobNum\: `bhist -l $jobNum|sed 's/ //g'|sed
's/ ^//'
|tr -d '\n'|tr ' ' '\n'|grep -B1 Processors|tr '\n' ' '`
}
where `bhist -l` is what generates the info to be parsed.
Thanks to all.
Michael
Reply to: