[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: in NEW: utf8-migration-tool -- Debian UTF-8 migration wizard

Martin-Éric Racine wrote:
su, 2006-12-31 kello 18:55 +0500, Alexander E. Patrakov kirjoitti:
Martin-Éric Racine wrote:
Having merged Vincent's patch, I uploaded utf8-migration-tool to NEW.

Since Etch will be Debian's first UTF-8 release - implying a migration
from legacy encodings for those upgrading from Sarge, which is precisely
what this tool tackles - it would be nice to approve it for Etch.

1) patrakov@home:~$ utf8migrationtool
Unexpected error: exceptions.IOError
Traceback (most recent call last):
   File "/usr/bin/utf8migrationtool", line 40, in ?
     dmrc = getconfig()
   File "/usr/bin/utf8migrationtool", line 34, in getconfig
IOError: [Errno 2] No such file or directory: '/home/patrakov/.dmrc'

Works fine here, so no comment.

This is because you have the .dmrc file. I don't (I created an empty file to get past this error when writing my first mail). This file presumably belongs to gdm, but I don't have gdm (I use "startx"), and your package installs fine without gdm. Missing dependency?

2) The tool must handle the already-migrated case better (e.g., by adding a line about that onto the second screen).

It does. Here, it says that the locale is already migrated. It also says
that it cannot find any files utilizing a legacy encoding.

Yes, it does, in the case when the old locale is from .dmrc.

3) The legacy locale for Russia is ru_RU.KOI8-R, not ru_RU, and the migration tool must handle this special case.

Russian is a messy case. Too many encodings, more than half of which are
OS-specific or otherwise standards that never gained momentum.  This is
further complicated by usage cases: while Unices tend to go for KOI8-R,
users that need to interact with Windows use CP1251 instead. Still, it's
up to Russian developers to add support for this; upstream simply cannot
anticipate every possible exception.

OK, I temporarily take this back (because the old report was based on empty .dmrc - but anyway, you could take the .KOI8-R part from $LANG). However, I replace my old report with this: when the old .dmrc contains


the migration tool migrates this to ru_RU.KOI8-R.UTF-8 which is wrong. Also it migrates de_DE@euro to de_DE@euro.UTF-8.

The locale names generally have the form:

ll_CC.CODESET@modifiers (where .CODESET and @MODIFIERS may or may not be present). The old codeset and the @euro modifier (but probably not other modifiers) must be stripped out.

4) migration of encodings is only a part of the game. The most important part is to deal with packages that do not work correctly in UTF-8 locales and cannot be fixed (e.g., a2ps). Since this part cannot be automated (as nobody has created such blacklist), I suggest mentioning this obstacle in the manual page and on the welcome screen.

Remaining UCS issues really belong in Etch's release notes, since it is
Debian's first release claiming UTF-8 support.

Yes, they do. However, not everyone reads the release notes, so why not point users to them explicitly on the welcome screen?

Thus, I cannot recommend migration of this package to Etch in its current shape.

And I still say this.

Alexander E. Patrakov

Reply to: