[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: quick scripting question - finding occurrence in many lines



On Fri, Nov 03, 2006 at 09:56:12PM -0500, Douglas Tutty wrote:
> On Fri, Nov 03, 2006 at 08:27:42PM +0000, michael wrote:
> > I've been trying to do this with 'awk' but am hitting probs (not used
> > awk for ages!) so all offers welcome! 
> > 
> > Given a multiple line file, IN, that contains the word Processor
> > (possibly split over 2 lines) I wish to output the field immediately
> > preceeding Processor.
> > 
> > eg for
> > 
> > junk info 18 Pro
> > cessor
> > 
> > I wish to get the field '18'
>  
> I've read the replies telling you about awk and it reminds me why I
> never use awk or regular expressions.  My mind doesn't do cryptic.  I
> either do fortran77 or python.  For this I would use python so you can
> lay it out step by step logically.  
> 
> Since it appears that newlines aren't significant, I would get rid of
> them. 
> 
> 	IN = open('IN')
> 	instring = IN.read()
> 	IN.close()
> 
> I would remove all newlines so it was one huge line. 
> 
> 	onelinestring = instring.replace('\n', ' ')
> 	del instring
> 
> Split the string into a list of words
> 
> 	inlist = onelinestring.split()
> 	del onelinestring
> 
> Iterate through the list looking for 'processor'
> 
> 	oldword = ' '
> 	for newword in inlist
> 		if word.lower == 'processor'
> 			print oldword	# the previous word
> 		oldword = newword
> 
> 	del inlist
> 
> So I did it in 8 lines instead of one, but in 10 years I'll still know
> what those 8 lines do.  All the del lines do is free memory as soon as
> possible as there is no need to keep multiple versions of the file
> around.  Internally, I don't know how awk and regular expressions handle
> this.  

Is this pseudo-code or does it actually run?  I had to add some crypic
noise, I mean ':' characters, in a couple of places, change "word" to 
"newword", and it still didn't seem to work.  The interesting part of
the otherwise mundane problem was that the pattern to match is perhaps
on two different lines.  I don't see how this is addressed in the 
proffered solution.

-- 
Ken Irving, fnkci@uaf.edu



Reply to: