[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: locale choices: <null>, C, C.UTF-8, en_US.UTF-8 and availability



On Mon, Sep 15, 2014 at 01:42:32AM +0900, Osamu Aoki wrote:
> LANG = C.UTF-8 (Not defined in anywhere easily found but available)
>  System can always be set this way under Jessie(??) and system acts
>  100% POSIX manner for ASCII characters while not corrupting UTF-8 
>  character processing.
>  No locale data generation seems to be required under Jessie.

It's available in wheezy too, although not squeeze.
/usr/lib/locale/C.UTF-8/ is shipped by libc-bin and thus guaranteed to
be available without locale generation.

> One of the package I maintain (ibus) has gettext data pairing
> en_US.UTF-8 data with localized text.  Currently, user choice in
> localized luggage is translated back to en_US.UTF-8 string and that
> English text containing non-ASCII is used for further processing.

I believe gettext warns about doing this, for exactly this kind of
reason.  As such most upstreams avoid non-ASCII msgstrs and so this is
fortunately a rare situation.

> Jessie seems to have C.UTF-8 available as default, am I safe to use
> C.UTF-8 as always available UTF-8 compatible English system?

Yes.  You can't guarantee it's available on non-Debian-based systems
though.

> I see several ways to fix packaging of ibus:
> 
>  * Patch the upstream source with s/en_US.UTF-8/C.UTF-8/g .
>  * Add locales-all to package dependency is another solution.
>  * (If locales package always set up en_US.UTF-8, I am off the hook.)

The first sounds correct here based on your description of the problem.
The second is an unnecessarily heavyweight dependency, and the third
would privilege a particular national locale for a not very good reason.

> I guess some apps may be in the same situation.  How other DDs are
> coping with apps which set its internal locale value temporarily to the
> default en_US.UTF-8?

If all your package needs is to extend character handling to cover UTF-8
rather than merely ASCII, then using the C.UTF-8 locale is the correct
fix on Debian.

There are some packages which care about more details of the locale, for
a variety of good and bad reasons (I've seen ones that naïvely decompose
it and try to use the language part as a directory name, for instance),
and those are more difficult to handle.  If this is done at build-time
then I would normally suggest build-depending on locales (not
locales-all) and using localedef to generate a temporary locale for the
course of the build.  If this is done at run-time then I might
reluctantly accept the need for a dependency on locales-all, but it's
usually a sign that some refactoring is needed.

-- 
Colin Watson                                       [cjwatson@debian.org]


Reply to: