[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: automatically-generated ISO-8859-1 characters in mulbibyte webpages



On Wed, Jan 08, 2003 at 10:23:58AM +0900, Tomohiro KUBOTA wrote:
> > Anyway, though I don't know such a module, your way can be very easily
> > implemented.  I think the easiest one is like following:
> > 
> >       $name =~ s/([\x80-\xff])/"&#".ord($1).";"/eg;
> 
> I wrote a new filter which
>   - assume the input string is UTF-8 if it can be interpreted as such,
>   - assume it is ISO-8859-1 if not.
> 
> Since UTF-8 encoding method is relatively strict, it is not likely that
> ISO-8859-1-intended string is wrongly assumed to be UTF-8.  I confirmed
> that people.names has no octet stream which can be interpreted as UTF-8.
> (Individual 8bit character must not be UTF-8; in UTF-8, 8bit character
> must appear in series.)
> 
> With this filter, my concern is completely solved.  Also you don't need
> to think about future maintainance labor when a new maintainer uses 8bit
> characters for his/her name.

Sounds very good, thanks.

-- 
     2. That which causes joy or happiness.



Reply to: