On Thu, Nov 09, 2006 at 12:52:57AM +1100, John O'Hagan wrote:
> >
> > tr -d '-\n' <IN | tr '\n' ' ' | tr -s ' ' '\n' | grep -B1 'Processor'
> >
> > | grep -v 'Processor\|--'
> >
>
> [...]
>
> Aha! You're right, my lines fail on the edge cases, and also when the target
> word is hyphenated.
>
> Your ingenious approach didn't always work either [1]; but it revealed (to me)
yup. okay, another one below
> that there will be unresolvable ambiguities in the IN file unless:
>
> EITHER: A) lines are broken arbitrarily without hyphenation, in which case
> newlines have no significance, spaces between words must preserved and we can
> use:
>
> #tr -d '\n' < IN | tr ' ' '\n' | grep -B1 Processor | grep -v 'Processor\|--'
>
> or in Python:
>
> #for i in open('IN').read().replace('\n', '').split('Processor')[0:-1]:
> # print i.split()[-1]
>
> OR: B) broken words are hyphenated, and unhyphenated newlines are equivalent
> to spaces, in which case we could use something like:
>
> ----------------
> while read i ; do
>
> if [[ $(echo "$i" | grep \\-\$ ) ]]; then
>
> i=$( echo "$i" | sed s/-\$//)
> echo "$i"
> else echo "$i"' '
> fi
>
> done < IN | tr -d '\n' | tr ' ' '\n' | grep -B1 'Processor' |
> grep -v 'Processor\|--'
> --------------------------------
>
> This removes hyphens at the end of lines or else adds a space, which converts
here's a simpler way to do that, I think ;)
tr '\n' ' ' | sed 's/- //g' | tr ' ' '\n' | grep -B1 'Processor' | grep
-v 'Processor\|--'
replace the newlines with spaces. then use sed as it matches better
than tr to strip occurences of '- '. The assumption here is that
hypens don't appear at the end of words and they only occur because we
created them with our tr '\n' ' '.
SO I wonder what happened to the OP? Is he just watching waiting for
the right solution, or is he long gone?
A
>
>
> [1] I tried Andrew's solution above and found that it only always worked on
> the unhyphenated case, I think because tr treats its arguments as character
> sets, not expresions, so that tr -d '\-\n' (note the escape required for the
> hyphen) deletes any hyphens or newlines, not just that combination.
yeah, that's what happens when you only think about the problem and
don't actually test it.
the above was briefly tested...
A
Attachment:
signature.asc
Description: Digital signature