[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: "Sitemap" webpage



Tomohiro KUBOTA:

> Imagine an ISO-2022-JP string has a JIS X 0208 part and following
> ASCII part. When the JIS X 0208 part ends with 0x22, it matches "\e
> and thus the regexp will fail.

Yes, I am aware of that, but since regular expressions are not powerful
enough to parse all possible combinations of this, I can't do it
entirely the "right" way. *If* the problem arises, we'll have to
implement some special case for parsing the Japanese titles.

(Of course, if the Japanese pages would have used a stateless encoding,
such as EUC-JP, this wouldn't have been a problem).

-- 
\\//
peter - http://www.softwolves.pp.se/

  Statement concerning unsolicited e-mail according to Swedish law:
  http://www.softwolves.pp.se/peter/reklampost.html



Reply to: