On Fri, Oct 29, 2010 at 11:36:59AM +0200, Adam Borowski wrote: > > I really wonder why you still need to install "locales" to get UTF-8. Even > in current glibc, it's a second class citizen. Several years ago, I > benchmarked a mockup of hard-coding UTF-8 the way ISO-8859-1 and KOI8-R were > done in the past, and it shaved 20% of the whole > fork-exec-ld-setlocale-getopt-...-exit sequence almost every program does. > The character classification tables are needlessly duplicated for every > locale as well -- try an ISO-8859-1 and look at iswfoo() for chars >0xFF, > even though there's a separate copy per locale, for all but C and POSIX it's > identical. #522776 has quite a bit of information about basic UTF-8 support without locales (creation of C.UTF-8). From the end of the report, there was talk of getting C.UTF-8 into squeeze, but I'm not sure what the status of that work is at present (it's a trivial glibc tweak to generate and package the additional locale). Do you still have your patch for hard-coding UTF-8? I did start doing this, but didn't get as far as having a working locale. It might be a good starting point if it still works with current glibc. I agree the duplication of character tables in glibc is totally insane; a single copy of each character set is more than plenty, and having both ASCII and UTF-8 hard-coded into glibc would be a major performance improvement, though it would require eliminating the duplication on locale loading. Having the entire UTF-8 table duplicated for each different locale you use is just mad. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
Attachment:
signature.asc
Description: Digital signature