Re: quick scripting question - finding occurrence in many lines
On Sat, Nov 04, 2006 at 01:03:14PM -0900, Ken Irving wrote:
> On Fri, Nov 03, 2006 at 09:56:12PM -0500, Douglas Tutty wrote:
> > On Fri, Nov 03, 2006 at 08:27:42PM +0000, michael wrote:
> > > I've been trying to do this with 'awk' but am hitting probs (not used
> > > awk for ages!) so all offers welcome!
> > >
> > > Given a multiple line file, IN, that contains the word Processor
> > > (possibly split over 2 lines) I wish to output the field immediately
> > > preceeding Processor.
> > >
> > > eg for
> > >
> > > junk info 18 Pro
> > > cessor
> > >
> > > I wish to get the field '18'
> >
> > I've read the replies telling you about awk and it reminds me why I
> > never use awk or regular expressions. My mind doesn't do cryptic. I
> > either do fortran77 or python. For this I would use python so you can
> > lay it out step by step logically.
> >
> > Since it appears that newlines aren't significant, I would get rid of
> > them.
> >
> > IN = open('IN')
> > instring = IN.read()
> > IN.close()
> >
> > I would remove all newlines so it was one huge line.
> >
> > onelinestring = instring.replace('\n', ' ')
> > del instring
> >
> > Split the string into a list of words
> >
> > inlist = onelinestring.split()
> > del onelinestring
> >
> > Iterate through the list looking for 'processor'
> >
> > oldword = ' '
> > for newword in inlist
> > if word.lower == 'processor'
> > print oldword # the previous word
> > oldword = newword
> >
> > del inlist
> >
> > So I did it in 8 lines instead of one, but in 10 years I'll still know
> > what those 8 lines do. All the del lines do is free memory as soon as
> > possible as there is no need to keep multiple versions of the file
> > around. Internally, I don't know how awk and regular expressions handle
> > this.
>
> Is this pseudo-code or does it actually run? I had to add some crypic
> noise, I mean ':' characters, in a couple of places, change "word" to
> "newword", and it still didn't seem to work. The interesting part of
> the otherwise mundane problem was that the pattern to match is perhaps
> on two different lines. I don't see how this is addressed in the
> proffered solution.
Ok, a bit of python hacking later... The same technique shown previously
(in awk) can be used:
#!/usr/bin/python
olderword = ' '
oldword = ' '
for newword in open('IN').read().split():
if newword.lower() == 'processor':
print oldword # the previous word
else: # try combining new and old word...
if oldword.lower() + newword.lower() == 'processor':
print olderword # the previouser word...
olderword = oldword
oldword = newword
There doesn't seem to be any need for storing/deleting variables for
handling the input, nor for replacing newlines with spaces.
Ken
--
Ken Irving, fnkci@uaf.edu
Reply to: