Re: default character encoding for everything in debian
On Wed, Aug 12, 2009 at 09:56:49AM +0200, Samuel Thibault wrote:
> Giacomo A. Catenazzi, le Wed 12 Aug 2009 08:03:30 +0200, a écrit :
> > Bastian Blank wrote:
> > > On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote:
> > >> In article <[🔎] 20090811183800.GE5487@const.famille.thibault.fr> you wrote:
> > >>> Not necessarily. Any sane implementation should just use wchar_t
> > >> Which could be UTF16 and therefore still has complicatd length semantics.
> > >
> > > No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
> > > Windows).
> > No wchar_t is locale dependent (per POSIX).
> What do you mean? The compiler can't know the locale in advance for
> the width and endianness. The value might depend on the locale, yes,
> but that's not a problem as long as you convert into UTF-8 before
> communicating with other applications.
> One same systems (Debian systems are), it's just always UCS-4.
Specifically, __STDC_ISO_10646__ is defined to indicate that wchar_t
is always UCS-4 in all locales.
> > BTW on gcc:
> > -fwide-exec-charset=charset
> > Set the wide execution character set, used for wide string and
> > character constants.
> It hurts when I shoot myself in the foot.
This feature of GCC is one of the more obscure areas of locale
handling. How does the encoding of strings at the level of
individial translation units work with a single per-process
global locale and C formatted I/O? Curious minds would like to
> > The default is UTF-32 or UTF-16, whichever corresponds to the width of
> > wchar_t.
> This documentation is bogus BTW. It should read "UCS-4 or UCS-2".
It's "strictly" correct according to the standard.
http://en.wikipedia.org/wiki/UTF-32/UCS-4 for an overview.
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.