[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Unicode conversion goals for Sarge (was Re: debian-changelog-file-uses-obsolete-national-charset)



On Wednesday 31 March 2004 08.05, Martin-Éric Racine wrote:

> Would recoding names in the GECOS data from /etc/passwd into UTF-8
> solve it?

No idea, if you're talking about the matching done by the Debian packaging 
scripts.

> If your answer to this is Yes, then I think that upgrading a distro to
> Sarge should also offer to upgrade /etc/passwd, /etc/shadow and
> /etc/groups to UTF-8.

The problem with that is that the system has no way to know what encoding 
is used in /etc/passwd. It could assume the locale the root user uses, or 
what is specified in /etc/environment, but I don't think this is safe.

>
> Heck, if you ask me, Sarge should be known as the "we upgrade everyone
> to UTF-8" Debian release. This would imply that absolutely every
> package to be released in Sarge would know about legacy encodings for
> each locale and be able to recode every config file, man page, etc.
> during the process of upgrading from Woody.

I guess too much software is not unicode ready yet. It's a worthy goal for 
sarge+1, though. I work in de_CH.utf-8 - the main problems are man, which 
still has glitches occasionally, and AFAIK is a total hack under the hood 
as far as unicode support is concerned, then fonts - I had some lengthy 
fights with my system, and I don't want to have to set up a new system 
because I don't understand what's going on there. Then there is the 
textmode console - I don't know, perhaps the sarge installer does set up 
this by default, but on a potato-installed system, the text console 
doesn't work with unicode characters at all.

The other problems are user data: I believe that the installer must not 
touch anything in /home - but switching to unicode causes all users to 
'lose' umlauts/accents/whatever, and in case of non-latin scripts, mostly 
everything. So it is necessary to have very, very, very good user 
documentation on how to upgrade. Textfiles are easy, but how about 
non-text document formats? I fear there are many such formats in use 
without proper encoding information embedded.

Already mentioned: telnet/ssh don't have a way to transport locale 
information over the connection - how do you tell this to users who may 
not be in a position to upgrade both machines at the same time and will 
suddenly lose the ability to use umlauts/accents/their script over a 
remove connection? (I don't have this problem because (i) when I write 
text, it's usually english and (ii) most machines I usually use ssh with 
are under my direct control, and so are also set up to use unicode.)

Just my €.02
cheers
-- vbi

-- 
Today is Setting Orange, the 17th day of Discord in the YOLD 3170

Attachment: pgpA8qE0SOxIz.pgp
Description: signature


Reply to: