[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

SOLVED: another script query (perl?)



On Fri, Sep 07, 2007 at 08:26:50AM -0700, tabris wrote:

> Richard Lyons wrote:
> > On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:
> >> Richard Lyons wrote:
[...]
> >>> I need a script to read a text file (actually tex) and parse lines of a
> >>> table that may or may not span newline characters in the file.
> >>> Basically, there are lines of the form
> >>>
> >>>    {some text} & {some more text} & {text c} & {text d} \\
> >>>
> >>> where the braces are only for clarity and do not occur in the files, and
> >>> where the bits of text may include whitespace which may include newline
> >>> characters. There may also be escaped ampersands in the text ('\&'), and
> >>> the text fragments may be empty.
> >>>
> >>> I suspect perl may be the way forward.  I need to be able to read each
> >>> file, parse each set of three ampersands with a double backslash
> >>> breaking it into four substrings, manipulate the substrings and write
> >>> the file anew.  A typical manipulation will be to take text c and copy
> >>> it to text d. I shall also try to strip leading and trailing whitespace
> >>> to tidy up the file.
> >>>   
> >>>       
> >> please give real examples the text you have, as well as more info about
> >> what processing you will do with it.
[...]
> >
> >     \mbox{Walls} &Plain plastered and painted white. &GC but to soiled
> > around switch, RHS as entering, HL marks. OW nail near centre, some
> >  blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
> >    corner, white painted, cracks at junctions.  & \\
> >
> > and here is another:
> >
> >    &catch, diecast \& epoxy coated with security lock & GC &\\

> >   
> well, I'd say something along these lines assuming that you have $l
> populated with the entire piece you want.
> Also note that this attempts to avoid use of regexps where possible, as
> they tend to be slow and hard to read. Not that I dislike regexps, but I
> don't think they're necessary here. Also note that none of this code has
> been tested, it's the product of about 5 minutes of hacking.
> 
> my @phrases = split('&', $l);
> {
>     my @tmp;
>     while(my $phrase = shift @phrases) {
>         if (substr($phrase, -2) eq '\') {
>            my $tmp = $phrase .'&'. (shift @phrases);
>         }
>         push @tmp, $phrase;
>     }
>     @phrases = @tmp;
> }
> 
> # remove trailing or leading whitespace
> foreach my $phrase (@phrases) {
>     $phrase =~ s/^\s//; #remove leading spaces
>     $phrase =~ s/\s$//; # remove trailing spaces
>     $phrase =~ s/\n/ /g; # change all new-line chars to spaces
> }
> 
> # now reconstruct your text however you want it.
> # I have a good (free, public-domain) line splitter if you need one.

I would like that.

The script fragments were a great help, thanks.  Main bug was $tmp is
unnecessary and should just be $phrase.  I'd post the whole script
(well, two actually -- I used a bash script to set things up and just
called the perl to do the dirty work), but it is such a specific use
that it would be of no use to anyone else.  Pity really after spending
so many hours on it.  Still, at least I learned some perl.  I should
recommend anyone else looking for a beginners' introduction to perl to 

    http://www.perltraining.com.au/notes.html

as well as 

    http://mailman.linuxchix.org/pipermail/courses/2003-September/001344.html
this second URL is lesson 10, which I have given because the earlier
lessons do not index forward.

There is, of course a mass of other stuff, not least http://cpan.org and
all the perldoc info. 

So thanks, case closed.

-- 
richard
> 




Reply to: