[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: "Sitemap" webpage



Tomohiro KUBOTA:

>   $title =~ s/^#use .* title="(.+?)(" .*$|"$|\e.*$)/$1/;
>
> I think it should be modified as:
>
>   $title =~ s/^#use .* title="(.+?)("\s.*$|"$)/$1/;

That does not work (that was my first attempt), because there are some
Japanese pages that have

   title="<switch to 0208>DBCS<switch to 0201>"<switch to ASCII><space>

and those were not matched properly. However, I seem to have missed a
quotation mark missing in the regexp, it should read:

   $title =~ s/^#use .* title="(.+?)(" .*$|"$|"\e.*$)/$1/;
                                              ^

> (I also modified to use \s instead of 0x20 space because it can
> match tab.  This is not related to the problem we are discussing
> now.)

That might be a good idea as well.


I can't commit a fix right now, the computer I have the CVS checked out to
is currently disassembled since I removed a failing cd-rom drive. I'll try
to fix it later today, though.

-- 
\\//
peter - http://www.softwolves.pp.se/

  Statement concerning unsolicited e-mail according to Swedish law:
  http://www.softwolves.pp.se/peter/reklampost.html



Reply to: