[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: quick scripting question - finding occurrence in many lines



On Thu, Nov 09, 2006 at 12:52:57AM +1100, John O'Hagan wrote:
> >
> > tr -d '-\n' <IN | tr '\n' ' ' | tr -s ' ' '\n' | grep -B1 'Processor'
> >
> > | grep -v 'Processor\|--'
> > 
> 
> [...]
> 
> Aha! You're right, my lines fail on the edge cases, and also when the target 
> word is hyphenated.
> 
> Your ingenious approach didn't always work either [1]; but it revealed (to me) 

yup. okay, another one below

> that there will be unresolvable ambiguities in the IN file unless:
>  
> EITHER: A) lines are broken arbitrarily without hyphenation, in which case 
> newlines have no significance, spaces between words must preserved and we can 
> use:
> 
> #tr -d '\n' < IN | tr ' ' '\n' | grep -B1 Processor | grep -v 'Processor\|--'
> 
> or in Python:
> 
> #for i in  open('IN').read().replace('\n', '').split('Processor')[0:-1]:
> #	print i.split()[-1] 
> 
> OR: B) broken words are hyphenated, and unhyphenated newlines are equivalent 
> to spaces, in which case we could use something like:
> 
> ----------------
> while read i ; do
> 
> 	if [[ $(echo "$i" | grep \\-\$ ) ]]; then
> 		
> 		i=$( echo "$i" | sed s/-\$//) 
> 		echo "$i" 
> 	else echo "$i"' '
> 	fi
> 
> done < IN | tr -d '\n' | tr ' ' '\n' | grep -B1 'Processor' | 
> grep -v 'Processor\|--'
> --------------------------------
> 
> This removes hyphens at the end of lines or else adds a space, which converts 
here's a simpler way to do that, I think ;)

tr '\n' ' ' | sed 's/- //g' | tr ' ' '\n' | grep -B1 'Processor' | grep
-v 'Processor\|--'

replace the newlines with spaces. then use sed as it matches better
than tr to strip occurences of '- '. The assumption here is that
hypens don't appear at the end of words and they only occur because we
created them with our tr '\n' ' '. 


SO I wonder what happened to the OP? Is he just watching waiting for
the right solution, or is he long gone?

A 
> 
> 
> [1] I tried Andrew's solution above and found that it only always worked on 
> the unhyphenated case, I think because tr treats its arguments as character 
> sets, not expresions, so that tr -d '\-\n' (note the escape required for the 
> hyphen) deletes any hyphens or newlines, not just that combination.

yeah, that's what happens when you only think about the problem and
don't actually test it.

the above was briefly tested...

A

Attachment: signature.asc
Description: Digital signature


Reply to: