Re[2]: OT: how to strip out SGML tags?
erik <erik@bossa.org> wrote:
> > ## Use STDIN if no files are given
> > $ARGV[0] = "-" unless @ARGV;
> >
> > ## Strip out anything contained in an SGML markup tag. This is not
> > ## very pretty and rather inefficient, but it does take care of tags
> > ## which cross line or paragraph boundaries.
> > foreach $file (@ARGV) {
> > open(INPUT,$file);
> > while($char = getc(INPUT)) {
> > if($char eq "<") {
> > IGNORE: for(;;) {
> > last IGNORE if (getc(INPUT) eq ">");
> >
> ... not sure why the IGNORE thing is in here; it seems like this should
> work but I would have simply done :
> if($char eq "<") {
> while(getc(INPUT) ne ">") {
> ;
> }
> }
>
I had trouble with your idea, but I went back to the original script I posted
and discovered that the problem is it dies whenever a numerical '0' is
encountered! Apart from that it works fine. It just so happened I had a '0' in
the first few lines of my SGML, but I didn't get the implication.
So zero makes the condition '$char = getc(INPUT)' evaluate to false, dumping
the flow down to closing the file. What's the perl equivalent of WHILE NOT
EOF? <g>
> Look reasonable?
--
Bob Bernstein http://www.ruptured-duck.com
Reply to: