[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale



On Wed, Apr 08, 2009 at 09:41:18AM +0200, Giacomo A. Catenazzi wrote:
> Roger Leigh wrote:
>  > I wasn't aware that this level of checking was performed, though
>> it does make sense.  But, does it not reject non 7-bit input in the C
>> locale for completeness?
>>
>> Should tools doing "raw" I/O not be using lower level interfaces
>> such as fread() and fwrite() rather than the "formatted" print
>> functions which are specified to behave in a locale-dependent
>> manner? 
>
> printf is not locale dependent, but on numeric display
> (and eventually on some extensions).

Each C FILE* stream has an associated locale.
Look at struct _IO_FILE_complete in libio.h.
The example program I posted demonstrates that this does actually
happen; the output streams use the current locale, and there is
a UTF-8 [narrow]/UCS-4 [wide] conversion to the locale codeset on
output.

When you output a string to a stream, there is a conversion step
from the exec charset (either narrow or wide) to the stream's
associated locale.  I haven't yet found documented exactly where
this happens (it's all in the libc internals), but I would
hazard a guess that all the "string" functions use this step,
where the lower-level byte-based I/O functions skip this step.

This machinery is also used by the C++ iostream locale imbue()
mechanism.

So while printf itself might not do the conversion, it's done
at some point, probably when printf copies the formatted string
to the stream buffer.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.


Reply to: