Re: locale choices: <null>, C, C.UTF-8, en_US.UTF-8 and availability
On Mon, Sep 15, 2014 at 01:42:32AM +0900, Osamu Aoki wrote:
> LANG = C.UTF-8 (Not defined in anywhere easily found but available)
> System can always be set this way under Jessie(??) and system acts
> 100% POSIX manner for ASCII characters while not corrupting UTF-8
> character processing.
> No locale data generation seems to be required under Jessie.
It's available in wheezy too, although not squeeze.
/usr/lib/locale/C.UTF-8/ is shipped by libc-bin and thus guaranteed to
be available without locale generation.
> One of the package I maintain (ibus) has gettext data pairing
> en_US.UTF-8 data with localized text. Currently, user choice in
> localized luggage is translated back to en_US.UTF-8 string and that
> English text containing non-ASCII is used for further processing.
I believe gettext warns about doing this, for exactly this kind of
reason. As such most upstreams avoid non-ASCII msgstrs and so this is
fortunately a rare situation.
> Jessie seems to have C.UTF-8 available as default, am I safe to use
> C.UTF-8 as always available UTF-8 compatible English system?
Yes. You can't guarantee it's available on non-Debian-based systems
though.
> I see several ways to fix packaging of ibus:
>
> * Patch the upstream source with s/en_US.UTF-8/C.UTF-8/g .
> * Add locales-all to package dependency is another solution.
> * (If locales package always set up en_US.UTF-8, I am off the hook.)
The first sounds correct here based on your description of the problem.
The second is an unnecessarily heavyweight dependency, and the third
would privilege a particular national locale for a not very good reason.
> I guess some apps may be in the same situation. How other DDs are
> coping with apps which set its internal locale value temporarily to the
> default en_US.UTF-8?
If all your package needs is to extend character handling to cover UTF-8
rather than merely ASCII, then using the C.UTF-8 locale is the correct
fix on Debian.
There are some packages which care about more details of the locale, for
a variety of good and bad reasons (I've seen ones that naïvely decompose
it and try to use the language part as a directory name, for instance),
and those are more difficult to handle. If this is done at build-time
then I would normally suggest build-depending on locales (not
locales-all) and using localedef to generate a temporary locale for the
course of the build. If this is done at run-time then I might
reluctantly accept the need for a dependency on locales-all, but it's
usually a sign that some refactoring is needed.
--
Colin Watson [cjwatson@debian.org]
Reply to: