Re: "Sitemap" webpage
Tomohiro KUBOTA:
> $title =~ s/^#use .* title="(.+?)(" .*$|"$|\e.*$)/$1/;
>
> I think it should be modified as:
>
> $title =~ s/^#use .* title="(.+?)("\s.*$|"$)/$1/;
That does not work (that was my first attempt), because there are some
Japanese pages that have
title="<switch to 0208>DBCS<switch to 0201>"<switch to ASCII><space>
and those were not matched properly. However, I seem to have missed a
quotation mark missing in the regexp, it should read:
$title =~ s/^#use .* title="(.+?)(" .*$|"$|"\e.*$)/$1/;
^
> (I also modified to use \s instead of 0x20 space because it can
> match tab. This is not related to the problem we are discussing
> now.)
That might be a good idea as well.
I can't commit a fix right now, the computer I have the CVS checked out to
is currently disassembled since I removed a failing cd-rom drive. I'll try
to fix it later today, though.
--
\\//
peter - http://www.softwolves.pp.se/
Statement concerning unsolicited e-mail according to Swedish law:
http://www.softwolves.pp.se/peter/reklampost.html
Reply to: