[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: another script query (perl?)



On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:

> Richard Lyons wrote:
> > Hi, all you script wizards.
> >
> > I thought this would be easy, but I haven't found anything to crib
> > from...
> >
> > I need a script to read a text file (actually tex) and parse lines of a
> > table that may or may not span newline characters in the file.
> > Basically, there are lines of the form
> >
> >    {some text} & {some more text} & {text c} & {text d} \\
> >
> > where the braces are only for clarity and do not occur in the files, and
> > where the bits of text may include whitespace which may include newline
> > characters. There may also be escaped ampersands in the text ('\&'), and
> > the text fragments may be empty.
> >
> > I suspect perl may be the way forward.  I need to be able to read each
> > file, parse each set of three ampersands with a double backslash
> > breaking it into four substrings, manipulate the substrings and write
> > the file anew.  A typical manipulation will be to take text c and copy
> > it to text d. I shall also try to strip leading and trailing whitespace
> > to tidy up the file.
> >
> > Any and all pointers will be gratefully received!
> >
> >   
> please give real examples the text you have, as well as more info about
> what processing you will do with it.
> There are multiple ways to approach this, we need to have more
> information first.
> 
I'm not sure it helps a lot, as they vary quite lot, but here is one:

    \mbox{Walls} &Plain plastered and painted white. &GC but to soiled
around switch, RHS as entering, HL marks. OW nail near centre, some
 blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
   corner, white painted, cracks at junctions.  & \\

and here is another:

   &catch, diecast \& epoxy coated with security lock & GC &\\

If it is unclear to any non-latex-user, the ampersands are table column
separators in latex.

After the manipulation I gave as an example, (copu text c to text d), I
would hope they would look like this:

\mbox{Walls} & Plain plastered and painted white. & GC but to soiled
around switch, RHS as entering, HL marks. OW nail near centre, some
blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
corner, white painted, cracks at junctions. & GC but to soiled around
switch, RHS as entering, HL marks. OW nail near centre, some blue-tac
remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white
painted, cracks at junctions. \\

and:

  & catch, diecast \& epoxy coated with security lock & GC & GC \\

The first example shows the problem of included newlines, which might
occur as here or anywhere else in the text. Note that the whole text
fragment has been copied to the previously void fourth field.

The second example shows the need not to be confused by '\&'.  

If that is any clearer...

-- 
richard



Reply to: