[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Some strange locale selections



[Hendrik Sattler]
> Hi,
> 
> what exactly is the difference between the locales
> de_DE.UTF-8
> de_DE.UTF-8@euro
> de_DE
> de_DE@euro
> ?

You can see that /usr/share/i18n/locales/de_DE@euro copies all fields
from de_DE, so the only difference is the charset used when compiling
locales.

> The two with UTF-8 are probably for a default of UTF-8 and de_DE is for
> ISO-8859-1 and de_DE@euro for ISO-8859-15. Or am I already wrong here?

You are fully right.

> All should have the EUR sign set as currency,

Still right.

> thus making is rather a strange thing to have those as selections.
> Additionally, the used charset can be specified at a later point thus
> making "de_DE.ISO-8859-15" the same as "de_DE@euro"

No, try for instance running
  $ LC_ALL=de_DE.ISO-8859-15 locale charmap
Charset is hardcoded when locales are compiled (see below for a
rationale) and cannot be changed afterwards.  For this reason, upstream
does not want to change encoding of existing locales, because this would
cause compatibility problems.  So when European locales had to switch
from ISO-8859-1 to ISO-8859-15, new @euro locales have been added.  Not
because of changes of their currency (such changes happen all the times,
e.g. in Turkey just now), but charset.

> and "de_DE.UTF-8" should probably work when only creating "de_DE" from
> the above list.

No, run
 $ localedef --no-archive -f ISO-8859-1 -i /usr/share/i18n/locales/de_DE \
   /tmp/de_DE
 $ localedef --no-archive -f UTF-8 -i /usr/share/i18n/locales/de_DE \
   /tmp/de_DE.UTF-8
and compare file size of /tmp/de_DE*/LC_*
/tmp/de_DE.UTF-8/LC_COLLATE is much larger because it can handle all
Unicode characters whereas ISO-8859-1 version does only recognize 256
characters.  The same is also true for LC_CTYPE.  So you cannot
compute de_DE.UTF-8 version from de_DE; doing the opposite may be
feasible, but this looks pretty useless.  In fact, a simpler solution
is to improve debconf questions when locales are selected, but nobody
did yet volunteer to do the job.

> Additionally, "de_DE.UTF-8" should be absolutely equal to
> "de_DE.UTF-8@euro"

It is.

> (shouldn't that be "de_DE@euro.UTF-8" anyway).

No, the latest convention is language_territory.charset@modifier

> So why are there 4 selections and not just 1? If this is just about
> default charset, wouldn't it be better to provide a README instead of
> duplicated locales?  If I am wrong about this, maybe there's a need to
> explain the selectable things a bit?

If you can write some useful explanations, it is likely that glibc
maintainers will include them.

> If the selection of the charset at a later point does not work, then
> at least "de_DE.UTF-8@euro" is absolutely useless

Agreed, see http://bugs.debian.org/274491
Feel free to add comments to this bugreport in order to let glibc
maintainers change their mind.

> (and the locale system would really be bad).

I do not understand this last sentence.

Denis



Reply to: