[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: "Sitemap" webpage



Hi,

At Fri, 6 Jul 2001 18:53:57 +0200 (CEST),
peter karlsson <peter@softwolves.pp.se> wrote:

> I have committed a fix now. It seems to work on my local machine (I
> can't read Japanese, but I can see that there is no mis-encoding left).

Thanks.  I checked.

I found many items read only "Debian".  These pages have titles of
"Debian <someting Japanese>", which are "Debian <esc><JIS X 0208
specifier string><JIS X 0208 literal><esc><ASCII specifier string>"
in bytes.  Thus, the first <esc> matches the regexp to end $title.
(Note the second <esc> also cannot end $title.  Well, <esc> cannot
be a end sign in any )

  $title =~ s/^#use .* title="(.+?)(" .*$|"$|\e.*$)/$1/;

I think it should be modified as:

  $title =~ s/^#use .* title="(.+?)("\s.*$|"$)/$1/;

I tested locally (as an independent perl script) and it works well
for such pages.

(I also modified to use \s instead of 0x20 space because it can
match tab.  This is not related to the problem we are discussing
now.)

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/



Reply to: