[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: quick scripting question - finding occurrence in many lines



On Sat, Nov 04, 2006 at 01:03:14PM -0900, Ken Irving wrote:
> On Fri, Nov 03, 2006 at 09:56:12PM -0500, Douglas Tutty wrote:
> > On Fri, Nov 03, 2006 at 08:27:42PM +0000, michael wrote:
> > > I've been trying to do this with 'awk' but am hitting probs (not used
> > > awk for ages!) so all offers welcome! 
> > > 
> > > Given a multiple line file, IN, that contains the word Processor
> > > (possibly split over 2 lines) I wish to output the field immediately
> > > preceeding Processor.
> > > 
> > > eg for
> > > 
> > > junk info 18 Pro
> > > cessor
> > > 
> > > I wish to get the field '18'
> >  
> > I've read the replies telling you about awk and it reminds me why I
> > never use awk or regular expressions.  My mind doesn't do cryptic.  I
> > either do fortran77 or python.  For this I would use python so you can
> > lay it out step by step logically.  
> > 
> > Since it appears that newlines aren't significant, I would get rid of
> > them. 
> > 
> > 	IN = open('IN')
> > 	instring = IN.read()
> > 	IN.close()
> > 
> > I would remove all newlines so it was one huge line. 
> > 
> > 	onelinestring = instring.replace('\n', ' ')
> > 	del instring
> > 
> > Split the string into a list of words
> > 
> > 	inlist = onelinestring.split()
> > 	del onelinestring
> > 
> > Iterate through the list looking for 'processor'
> > 
> > 	oldword = ' '
> > 	for newword in inlist
> > 		if word.lower == 'processor'
> > 			print oldword	# the previous word
> > 		oldword = newword
> > 
> > 	del inlist
> > 
> > So I did it in 8 lines instead of one, but in 10 years I'll still know
> > what those 8 lines do.  All the del lines do is free memory as soon as
> > possible as there is no need to keep multiple versions of the file
> > around.  Internally, I don't know how awk and regular expressions handle
> > this.  
> 
> Is this pseudo-code or does it actually run?  I had to add some crypic
> noise, I mean ':' characters, in a couple of places, change "word" to 
> "newword", and it still didn't seem to work.  The interesting part of
> the otherwise mundane problem was that the pattern to match is perhaps
> on two different lines.  I don't see how this is addressed in the 
> proffered solution.

Ok, a bit of python hacking later...  The same technique shown previously
(in awk) can be used:

    #!/usr/bin/python
    olderword = ' '
    oldword = ' '  
    for newword in open('IN').read().split():
        if newword.lower() == 'processor':
            print oldword   # the previous word
        else: # try combining new and old word...
            if oldword.lower() + newword.lower() == 'processor':
                print olderword   # the previouser word...
        olderword = oldword
        oldword = newword

There doesn't seem to be any need for storing/deleting variables for
handling the input, nor for replacing newlines with spaces. 

Ken
-- 
Ken Irving, fnkci@uaf.edu



Reply to: