Re: quick scripting question - finding occurrence in many lines

To: debian-user@lists.debian.org
Subject: Re: quick scripting question - finding occurrence in many lines
From: Ken Irving <fnkci@uaf.edu>
Date: Sat, 4 Nov 2006 19:51:29 -0900
Message-id: <[🔎] 20061105045129.GA10333@localhost>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 20061104220314.GA28932@localhost>
References: <[🔎] 1162585662.5834.2.camel@manchester-campaigns> <[🔎] 20061104025612.GB2712@pluto> <[🔎] 20061104220314.GA28932@localhost>

On Sat, Nov 04, 2006 at 01:03:14PM -0900, Ken Irving wrote:
> On Fri, Nov 03, 2006 at 09:56:12PM -0500, Douglas Tutty wrote:
> > On Fri, Nov 03, 2006 at 08:27:42PM +0000, michael wrote:
> > > I've been trying to do this with 'awk' but am hitting probs (not used
> > > awk for ages!) so all offers welcome! 
> > > 
> > > Given a multiple line file, IN, that contains the word Processor
> > > (possibly split over 2 lines) I wish to output the field immediately
> > > preceeding Processor.
> > > 
> > > eg for
> > > 
> > > junk info 18 Pro
> > > cessor
> > > 
> > > I wish to get the field '18'
> >  
> > I've read the replies telling you about awk and it reminds me why I
> > never use awk or regular expressions.  My mind doesn't do cryptic.  I
> > either do fortran77 or python.  For this I would use python so you can
> > lay it out step by step logically.  
> > 
> > Since it appears that newlines aren't significant, I would get rid of
> > them. 
> > 
> > 	IN = open('IN')
> > 	instring = IN.read()
> > 	IN.close()
> > 
> > I would remove all newlines so it was one huge line. 
> > 
> > 	onelinestring = instring.replace('\n', ' ')
> > 	del instring
> > 
> > Split the string into a list of words
> > 
> > 	inlist = onelinestring.split()
> > 	del onelinestring
> > 
> > Iterate through the list looking for 'processor'
> > 
> > 	oldword = ' '
> > 	for newword in inlist
> > 		if word.lower == 'processor'
> > 			print oldword	# the previous word
> > 		oldword = newword
> > 
> > 	del inlist
> > 
> > So I did it in 8 lines instead of one, but in 10 years I'll still know
> > what those 8 lines do.  All the del lines do is free memory as soon as
> > possible as there is no need to keep multiple versions of the file
> > around.  Internally, I don't know how awk and regular expressions handle
> > this.  
> 
> Is this pseudo-code or does it actually run?  I had to add some crypic
> noise, I mean ':' characters, in a couple of places, change "word" to 
> "newword", and it still didn't seem to work.  The interesting part of
> the otherwise mundane problem was that the pattern to match is perhaps
> on two different lines.  I don't see how this is addressed in the 
> proffered solution.

Ok, a bit of python hacking later...  The same technique shown previously
(in awk) can be used:

    #!/usr/bin/python
    olderword = ' '
    oldword = ' '  
    for newword in open('IN').read().split():
        if newword.lower() == 'processor':
            print oldword   # the previous word
        else: # try combining new and old word...
            if oldword.lower() + newword.lower() == 'processor':
                print olderword   # the previouser word...
        olderword = oldword
        oldword = newword

There doesn't seem to be any need for storing/deleting variables for
handling the input, nor for replacing newlines with spaces. 

Ken
-- 
Ken Irving, fnkci@uaf.edu

Reply to:

References:
- quick scripting question - finding occurrence in many lines
  - From: michael <linux@networkingnewsletter.org.uk>
- Re: quick scripting question - finding occurrence in many lines
  - From: Douglas Tutty <dtutty@porchlight.ca>
- Re: quick scripting question - finding occurrence in many lines
  - From: Ken Irving <fnkci@uaf.edu>

Prev by Date: Re: cannot ping my own machine
Next by Date: Re: quick scripting question - finding occurrence in many lines
Previous by thread: Re: quick scripting question - finding occurrence in many lines
Next by thread: Re: quick scripting question - finding occurrence in many lines
Index(es):
- Date
- Thread