[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#292330: project: UTF-8 as default



On Sun, 30 Jan 2005, Glenn Maynard wrote:
> Chinese and Japanese text at once.  If you're a Japanese user, converting

Yes.

> with a Chinese font, this isn't a problem.  (A multilingual user is
> no better off with a different locale, though--if you can store files
> in different encodings, you have to tag the encoding, and if you can
> do that, the locale doesn't matter.)

You can do that with some encodings.  I seem to recall many of the ISO ones
for CJK are quite capable of doing it.  EUC certainly *isn't*, and
shift-jis isn't either (but then, EUC is a hack, and shift-jis is an ugly
hack).

> A similar problem arises when sending mail out: I want to send mail
> in local charsets, not in UTF-8, since many popular but broken

Also correct.

> However, it does: I've configured it to send mail as ISO-8859-1,
> ISO-2022-JP, UTF-8 priority.  This isn't automatic or trivial, but
> it's not black magic, either.

We are talking about defaults.  Yes, it is possible depending on the MUA
(some cannot do it).  But is it a sane *default* ?

> I don't claim UTF-8 is usable, yet, for all languages and all environments
> (eg. input method support, perhaps), but I *have* done some research
> on CJK unification ("done my homework"), and I havn't seen how it's a
> difficult problem for a single-language user.  (I may, of course, have missed
> something obvious--feel free to point it out.)

Many of us are *NOT* single-language users, as in "single-charset-using"
language users, even.  But that's not where the problem lies.

The problem is with the idea of using UTF8 as a *default* for *all* locales
right now.  That means one has to know how to deal with charsets (most
people don't even know what it its. I hope the 0-8-15 :) Debian user does,
but...), and that one has to go around fixing the charset setup in most
applications.  This would make UTF8 a bad default.

Of course, when we (who live in a non-ASCII world) start to feel like the
above is not needed for more than 50% of the users for our locale, then it
is a good time to switch _that locale_ to UTF8, IMHO.

> (Even Windows manages to get font selection right: despite the fact that

I'd expect Windows to do better on THAT than most Unix applications.  But I
have not tested e.g. FireFox to make sure it is that smart.

> > Also I can assure you 80% of the mail I see getting through the mail servers
> > I admin is either latin-1 encoded, or that Windows CP1252 monstruosity
> > (often mistagged as latin-1).  Too much of it without any sort of charset
> > declarations at all, since too many people use extremely crappy software.
> > It is even worse for web pages.
> 
> Mutt automatically attempts to guess the charset of incoming data when
> they have no content-type--the charset of the local system is completely
> irrelevant to that.  Mutt detects incoming mail the same way regardless
> of whether my locale is UTF-8 or ISO-8859-1.  (I don't know about other
> mailers, but this seems very basic.)  The same applies for browsers.

It is not basic at all :(

> Again, the locale is irrelevant here, though some software might use
> the locale to determine the default.

Sort of.  Most software will use your default locale unless you configure
them differently.  This is usually much nicer to broken software that
*others* have, so chances are you are going to want it that way.

And in that case, you have to configure your application to change charsets
(even mutt).  If you are going to do that for enough applications, how is
UTF8 a sane *default* value for that locale?

I expect UTF8 will be a good idea in Brazil in about two to three years.  I
will not presume to know that for any other locale.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Reply to: