Re: quick scripting question - finding occurrence in many lines

To: Andrew Sackville-West <andrew@farwestbilliards.com>, debian-user@lists.debian.org
Subject: Re: quick scripting question - finding occurrence in many lines
From: John O'Hagan <johnohagan@iprimus.com.au>
Date: Wed, 8 Nov 2006 02:51:20 +1100
Message-id: <[🔎] 200611080251.20636.johnohagan@iprimus.com.au>
In-reply-to: <[🔎] 20061106155502.GA19081@localhost.localdomain>
References: <[🔎] 1162585662.5834.2.camel@manchester-campaigns> <[🔎] 200611070100.34877.johnohagan@iprimus.com.au> <[🔎] 20061106155502.GA19081@localhost.localdomain>

On Tuesday 07 November 2006 02:55, Andrew Sackville-West wrote:
> On Tue, Nov 07, 2006 at 01:00:34AM +1100, John O'Hagan wrote:

[...]

> > You're right; but the OP, Michael, gave the above scenario as his
> > problem. If your situation were the case, though, I guess we could use tr
> > -d '-' to get rid of all the hyphens first as well.
>
> the problem there is what if the desired result word includes a
> hyphen, then you'll have modified your result. I think you should go
> ahead and tr -d '\n' | tr ' ' '\n' | and then grep for a regex of
> Processor that allows for hyphens. you could limit it to the usual
> hyphen locations Pro-cess-or or is it Pro-ces-sor?
>
> here's another problem. target word is at end of line with processor
> at beginning of next line. There is only a newline between them and
> so the result becomes
>
> test
> word
> target-wordProcessor
> other
> junk
>
> you're grep will return 'word' instead of 'target-word'. You'd have to
> use a n old find-replace trick
>
> tr '\n' ' ' | tr -s ' ' '\n' | grep -B1 'Pro-*cess-*or' | grep -v
> 'Pro-*cess-*or\--'
>
>
> this replaces newlines with spaces and then replaces all single or
> multiple occurences of spaces with newlines. this allows that edge
> case above to come through properly. Then I think the grep is right
> to match zero or more hyphens in processor.
>

I tried this, and found that replacing the newlines with spaces stops the grep 
from working because it puts spaces in the middle of any occurrences 
of "Processor", but I see what you mean about the edge case. I think this 
version takes care of it, plus it is hyphen-agnostic:

tr  -d '\n'  <IN | sed s/P-*r-*o-*c-*e-*s-*s-*o-*r/' Processor'/g | 
tr -s ' ' '\n' | grep -B1 'Processor' |  grep -v 'Processor\|--'

removing newlines, replacing all cases of (non-)hyphenated "Processor" with a 
space followed by "Processor", then doing the grep. And here's a Python 
version using the re module to deal with the hyphens ( the edge case takes 
care of itself here):

import re

for i in re.split('P-?r-?o-?c-?e-?s-?s-?o-?r', 
open('IN').read().replace('\n', ''))[0:-1]:
	print i.split()[-1]

Have we done this to death yet? :)

Regards,

John

Reply to:

Follow-Ups:
- Re: quick scripting question - finding occurrence in many lines
  - From: Andrew Sackville-West <andrew@farwestbilliards.com>

References:
- quick scripting question - finding occurrence in many lines
  - From: michael <linux@networkingnewsletter.org.uk>
- Re: quick scripting question - finding occurrence in many lines
  - From: John O'Hagan <johnohagan@iprimus.com.au>
- Re: quick scripting question - finding occurrence in many lines
  - From: Andrew Sackville-West <andrew@farwestbilliards.com>

Prev by Date: Re: "processor" acpi module permanently throttles processor
Next by Date: Re: elegance vs. one-lineness (Was: quick scripting question - finding occurrence in many lines)
Previous by thread: Re: quick scripting question - finding occurrence in many lines
Next by thread: Re: quick scripting question - finding occurrence in many lines
Index(es):
- Date
- Thread