[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Squeeze can't fit on 512MiB



On Fri, Oct 29, 2010 at 11:36:59AM +0200, Adam Borowski wrote:
> 
> I really wonder why you still need to install "locales" to get UTF-8.  Even
> in current glibc, it's a second class citizen.  Several years ago, I
> benchmarked a mockup of hard-coding UTF-8 the way ISO-8859-1 and KOI8-R were
> done in the past, and it shaved 20% of the whole
> fork-exec-ld-setlocale-getopt-...-exit sequence almost every program does.
> The character classification tables are needlessly duplicated for every
> locale as well -- try an ISO-8859-1 and look at iswfoo() for chars >0xFF,
> even though there's a separate copy per locale, for all but C and POSIX it's
> identical.

#522776 has quite a bit of information about basic UTF-8 support without
locales (creation of C.UTF-8).  From the end of the report, there was talk
of getting C.UTF-8 into squeeze, but I'm not sure what the status of that
work is at present (it's a trivial glibc tweak to generate and package the
additional locale).

Do you still have your patch for hard-coding UTF-8?  I did start doing this,
but didn't get as far as having a working locale.  It might be a good
starting point if it still works with current glibc.

I agree the duplication of character tables in glibc is totally insane; a
single copy of each character set is more than plenty, and having both
ASCII and UTF-8 hard-coded into glibc would be a major performance
improvement, though it would require eliminating the duplication on locale
loading.  Having the entire UTF-8 table duplicated for each different
locale you use is just mad.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

Attachment: signature.asc
Description: Digital signature


Reply to: