[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: charsets in debian/control



[Thaddeus H. Black]
> Would Peter permit me a mild dissent?  I prefer Latin-1.

Dissents are fine. (:

The reason to go with UTF-8 is for consistency.  Tools that wish to
render text onto the screen ought to be able to depend on knowing the
encoding that text is in.  See below for why I (and many others) think
UTF-8 is the right choice for an encoding to standardize on.

> I do not deny that Latin-1 represents all the languages I can read,
> and that this fact may color my view.  Nevertheless to me a source
> written in Chinese is effectively non-free.  It might as well be a
> compiled binary blob.

Consider packages intended for speakers of other languages: for
example, an Urdu dictionary.  The Description field would traditionally
describe the package both in English and in Urdu (which uses the Arabic
alphabet), and I think that's perfectly fine: the target audience can
read its description more easily, and the rest of us can read the
English.  Now extrapolate to cases involving arbitrary languages, and
this is possible only if the Description field uses an encoding of
Unicode.  (Well, one could invent an extra header to specify the
character set, but that seems pointless in the extreme.)

UTF-8 is by far the best encoding of Unicode for our purposes, since it
was designed to be compatible with tools that parse ASCII.  Other
Unicode encodings have null bytes and other ASCII values embedded in
non-ASCII characters.

You can argue, and I would agree, that the Maintainer and Uploaders
fields (the only fields other than Description where we are likely to
see non-ASCII text) ought to be written in roman letters.  People
involved with Debian development are required to know a certain amount
of English in any case, so the roman alphabet is a common denominator.
And, unlike the Description field, it's awkward to try and have both
native glyphs and a roman transliteration.  However, I see no reason to
tell Eastern Europeans that they cannot write their names natively;
interpreting Eastern European diacritics is no harder for people who
don't speak those languages than interpreting Western European
diacritics for people who don't speak those.

Peter

Attachment: signature.asc
Description: Digital signature


Reply to: