[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: quick scripting question - finding occurrence in many lines

On Thu, Nov 09, 2006 at 12:52:57AM +1100, John O'Hagan wrote:
> >
> > tr -d '-\n' <IN | tr '\n' ' ' | tr -s ' ' '\n' | grep -B1 'Processor'
> >
> > | grep -v 'Processor\|--'
> > 
> [...]
> Aha! You're right, my lines fail on the edge cases, and also when the target 
> word is hyphenated.
> Your ingenious approach didn't always work either [1]; but it revealed (to me) 

yup. okay, another one below

> that there will be unresolvable ambiguities in the IN file unless:
> EITHER: A) lines are broken arbitrarily without hyphenation, in which case 
> newlines have no significance, spaces between words must preserved and we can 
> use:
> #tr -d '\n' < IN | tr ' ' '\n' | grep -B1 Processor | grep -v 'Processor\|--'
> or in Python:
> #for i in  open('IN').read().replace('\n', '').split('Processor')[0:-1]:
> #	print i.split()[-1] 
> OR: B) broken words are hyphenated, and unhyphenated newlines are equivalent 
> to spaces, in which case we could use something like:
> ----------------
> while read i ; do
> 	if [[ $(echo "$i" | grep \\-\$ ) ]]; then
> 		i=$( echo "$i" | sed s/-\$//) 
> 		echo "$i" 
> 	else echo "$i"' '
> 	fi
> done < IN | tr -d '\n' | tr ' ' '\n' | grep -B1 'Processor' | 
> grep -v 'Processor\|--'
> --------------------------------
> This removes hyphens at the end of lines or else adds a space, which converts 
here's a simpler way to do that, I think ;)

tr '\n' ' ' | sed 's/- //g' | tr ' ' '\n' | grep -B1 'Processor' | grep
-v 'Processor\|--'

replace the newlines with spaces. then use sed as it matches better
than tr to strip occurences of '- '. The assumption here is that
hypens don't appear at the end of words and they only occur because we
created them with our tr '\n' ' '. 

SO I wonder what happened to the OP? Is he just watching waiting for
the right solution, or is he long gone?

> [1] I tried Andrew's solution above and found that it only always worked on 
> the unhyphenated case, I think because tr treats its arguments as character 
> sets, not expresions, so that tr -d '\-\n' (note the escape required for the 
> hyphen) deletes any hyphens or newlines, not just that combination.

yeah, that's what happens when you only think about the problem and
don't actually test it.

the above was briefly tested...


Attachment: signature.asc
Description: Digital signature

Reply to: