[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: utf8-migration-tool: [PATCH] remove the codeset and modifier of the current locale



Hi,

On Mon, Jan 08, 2007 at 12:28:42PM +0200, q-funk@iki.fi wrote:
> 
> Looking at the patch, this will parse any locale up to the first dot or
> arobas it finds and use the part before that to deduct the locale, which
> should successfully convert any locale to UTF-8.
> 
> The only exception I can think of is cases where the arobas points to a
> useful variant. One example is Serbian: the default encoding is Cyrillic
> but there is an @latin variant also. There, utf8migrationtool would make
> the wrong assumption.

I don't think it's an issue.
The regex is used to build the new locale, before concatenating ".UTF-8",
but it is not used to guess the encoding.

This new locale is only used to replace the value in the .dmrc file.

Thus,
if ru_RU.KOI8-R is found in the .dmrc, the final .dmrc should contain ru_RU.UTF-8
if fr_FR@euro is found in the .dmrc, the final .dmrc should contain fr_FR.UTF-8

The original encoding is not important (for the matter writing the new
.dmrc). In both case, the codeset and modifier were removed.

There may be an issue with ca_ES@valencia, which should be changed to
ca_ES.UTF-8@valencia, not ca_ES.UTF-8. But that should not be an issue
with the renaming of the files.
There may be also an issue when there is no .UTF-8 locale (but IMHO, in
these case, it means that there is only one supported encoding, which is
UTF-8, see dz_BT for example, and utf8migrationtool should warn and exit).

The current encoding is found by looking for the current locale, with
findEncoding(), before the locale passes through the regex.

Also, by looking at the code, I can find a few places were utf8migrationtool
should fail and quit:
 * if the .dmrc is not found, the code will fail
 * if the locale is not found in the .dmrc, utf8migrationtool uses
   locale.getlocale(locale.LC_ALL), which is not allowed according to the
   locale.getlocale() specification (this is probably the cause of #405878
 * if the locale is not found finally, it's not useful to assume that the
   locale is "C" because in that case, no conversion is needed.

Kind Regards,
-- 
Nekral



Reply to: