[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#207455: acknowledged by developer



Hi,

From: Frank Lichtenheld <djpig@djpig.de>
Subject: Bug#207455: acknowledged by developer (Re: Bug#207455: packages.debian.org: HTML-encodes multi-byte characters as single bytes)
Date: Thu, 4 Sep 2003 18:46:36 +0200

> 1. Take them literaly and specify a charset=ascii for the page
> 2. Dito, but charset=iso-8859-1
> 3. Dito, but charset=utf-8
> 4. Use one of the three charsets but make a list of broken descriptions
> that have to be converted
> 
> Currently we do (2) but I would prefer to go to (3). As long as policy
> doesn't mandate one encoding for the description it's our decision
> anyway and I would prefer to give everyone the same chance to break
> something ;)

Yes, ISO-8859-1 is a *local* character encoding which is useful only
for a part of European-language speaking people.

Currently, ASCII is the only character range which is common in the
world.  Though migration into UTF-8 is welcome, please note that
U+0020 - U+007E will continue the only common character range for
a while.

If the policy will mandate usage of UTF-8, then the policy will have
to note that the contents must be comprehensible even when being read
in ASCII environment, i.e., even when non-ASCII characters are removed.

Indeed, in multibyte locales which are popular in east Asia, an 8bit
character (for example ISO-8859-1) will break not only the character
itself but also the next character.

Even though we have to be careful to use UTF-8, it is much better than
the current situation that Debian is biased to a part of a world (i.e.,
ISO-8859-1 usage).

Note that I think UTF-8 environment will not be popular until several
basic features (like manpages) will be UTF-8-ready.

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/





Reply to: