[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: quick scripting question - finding occurrence in many lines



On Wed, Nov 08, 2006 at 02:51:20AM +1100, John O'Hagan wrote:
> 
> I tried this, and found that replacing the newlines with spaces stops the grep 
> from working because it puts spaces in the middle of any occurrences 
> of "Processor", but I see what you mean about the edge case. I think this 
> version takes care of it, plus it is hyphen-agnostic:
> 

> tr  -d '\n'  <IN | sed s/P-*r-*o-*c-*e-*s-*s-*o-*r/' Processor'/g | 
> tr -s ' ' '\n' | grep -B1 'Processor' |  grep -v 'Processor\|--'
> 
> removing newlines, replacing all cases of (non-)hyphenated "Processor" with a 
> space followed by "Processor", then doing the grep. And here's a Python 
> version using the re module to deal with the hyphens ( the edge case takes 
> care of itself here):
> 
> import re
> 
> for i in re.split('P-?r-?o-?c-?e-?s-?s-?o-?r', 
> open('IN').read().replace('\n', ''))[0:-1]:
> 	print i.split()[-1]


huh, I'm not sure. I played with it a little and here's another
problem

here is some testing
data processor

will return 'testingdata' because the newlines get stripped out
leaving no space between the words. so..

first, replace all '-\n' with '' so we dehyphenate any hyphenated
words split by a newline. there will be some words that should be
hyphenated but lose that hyphen, however, I think that's probably a
pretty rare case and it ignores any mid-line hyphenated words. also
makes it easier to grep as we can ignore the hyphens in processor  next
replace all '\n' with ' ' so that we avoid the above problem. then
replace any single-or-more occurance of ' ' with '\n' to split the
words into seperate lines and finally grep away. 

tr -d '-\n' <IN | tr '\n' ' ' | tr -s ' ' '\n' | grep -B1 'Processor'
| grep -v 'Processor\|--'

> 
> Have we done this to death yet? :)

there must be more. I haven't seen any perl junkies provide us with
some permutation of ($*&#^&*%^^@@Processor%^&^$%^%#$&^$%*&^% that
spits the answer right out. ^^^^----- that's not perl code BTW, just
random shifted number-row. but it looks like perl eh? hehe

A

Attachment: signature.asc
Description: Digital signature


Reply to: