[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status of UTF-8 Debian changelogs



On Sat, Jun 07, 2003 at 04:59:29PM +0300, Dmitry Borodaenko wrote:
> On Thu, Jun 05, 2003 at 08:57:06PM -0400, Colin Walters wrote:
>  JR>> the only thing that will change is that if someone complains at
>  JR>> people who use UTF-8 in changelogs, a new retort will be
>  JR>> available, "THE POLICY MADE ME DO IT!!1!", or similar.
>  CW> Why would someone complain?
> 
> I would complain.
> 
> I am using KOI8-R terminal which can not display Latin-1 characters,

Where did Latin-1 come into this?

> and it seems backward to me to mandate or even allow _usage_ of UTF-8
> ahead of getting it _supported_ across the system.

If you find yourself with a UTF-8 file, use a program which knows how to
recode on the fly to your native encoding. Such programs are
increasingly common.

What do you lose here? Those who have fonts that can display the
character in question will be able to do so; those who don't won't, but
will see some reasonably obvious indicator like a "?" or a filled-in
square to show that the character is one they can't display. This is
superior to the situation where those who don't have such fonts just see
some gibberish.

> I'd rather have 7-bit ASCII changelogs: why Latin-1 users are
> privileged to use native spelling of their names, while Cyrillic and
> Kanji and other users have to resort to transliteration?

They aren't so privileged. They may decide to do it anyway, but since
the encoding of changelogs is not yet specified you currently take pot
luck on anything outside 7-bit ASCII.

I believe you've just contradicted yourself, anyway. Nobody wants to
have to transliterate their name. I don't want to have to transliterate
the names of people who help me with my packages when I credit them in
the changelog; in some cases I may not even know how to transliterate
their names correctly. UTF-8 allows me to spell their names correctly.
At worst, a couple of characters may not be displayed properly for
people using legacy encodings who don't have software that can recode
for them, but if I'd artificially transliterated to 7-bit ASCII then
nobody would get to see the correct spellings anyway.

Since UTF-8 includes ASCII, all the technical content of my changelogs
will still appear normally no matter what locale you're using, but
suddenly it becomes possible for me to credit my contributors properly
regardless of whether they come from Spain, Russia, or Japan.

We're not talking about mandating the use of UTF-8 across the whole
system here. We're talking about recommending its use in one particular
case where it gives a small but real benefit, and where the consequences
of getting it wrong are not very important (we can always go back and
recode a few changelogs if some unforeseen badness results). Think of it
as a safe experiment in advance of wider deployment of UTF-8 later on.

Package maintainers who aren't set up for writing UTF-8 can always
resort to transliteration into ASCII if need be.

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Reply to: