[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#292330: marked as done (project: UTF-8 as default)



Your message dated Fri, 4 Apr 2008 20:37:22 +0200
with message-id <20080404183722.GA13875@benz.df7cb.de>
and subject line Re: project: UTF-8 as default
has caused the Debian Bug report #292330,
regarding project: UTF-8 as default
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
292330: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=292330
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: project
Severity: wishlist
Tags: l10n


Most basic problems with use of UTF-8 (both in languages and standard
libraries) should have been fixed now, and as I see it, it's time to
head for easier integration of UTF-8, system-wide.

By this, I'm not talking about enforcing this character code on the
whole Debian system, but see to that:
1) Installing systems with UTF-8 is easier, also with locales not
strictly in need of this. UTF-8 as default is not necessarily my
ultimate goal (as the title suggests), but having the option of using
UTF-8 (or other encodings) system-wide, no matter what languages are
chosen.
2) See to that all Debian packages handles UTF-8 properly.

The problem with choosing one character encoding per language is multilingual environments.
When one language suggests one encoding and another language something else, 
trying to mix these languages will always give you unreadable text. 
One way or another.

As written in http://www.jw-stumpel.nl/stestu.html:
"Traditionally, for storing texts in various languages, special encoding
methods are used, for instance Latin-1 (1 byte per character) for
West-European languages with accented letters, KOI-8 for Russian, or
EUC-JP (2 bytes per character) for Japanese.

Only very limited 'mixing' of languages (..) is possible in these
systems."


Some examples:
1)
I've been working in Eritrea lately, setting up computers in a school.
Eritreans have nine official languages, all treated equally. 
One is Arabic, using arabic script of course.
Two, Tigre and Tigrinya, uses an ancient script called Geez. Normal
western left-to-right, but more than two hundred letters look nothing
like Latin.
The rest of the languages use the latin alphabet.
Adding to that, the official language in school, secondary level and up, is
English. That doesn't stop them from wanting to use their own languages from time to time.
So the situation is this:
They'll mostly use English, but sometimes other languages, covering
up to three script systems. This means documents, file names, etcetera.
And even when using English desktop settings, they'll want be able to
read these other scripts.
Only option is to use UTF-8 on the whole system, no matter what
language.

2)
There's an ethnic minority in this country of mine, called Sami. They have their own language. Basically they use latin characters, but with some extensions only covered in UTF-8.
The rest of us use ISO-LATIN-1, also called ISO-8859-1. Popular
eight bit charset, that is.
Now: Most of us only see Sami language occasionally. We can't even read
that other language, so it doesn't bother us if ISO-8859-1 is default.
Debian-installer enforces it quite heavily.
But some people use both. More or less one, more or less the other.
So what do you make your default language, when one of them (the most
popular) will give you gibberish in every second word?


So:
So for people only using English, it doesn't matter. Nor much in Western
Europe.
But the rest of the world uses several languages and even several
scripts. Especially when using computers, english-dominated as they are.
Character encodings not supporting all characters can only be used for a
few languages at a time. Redhat solved this a long time ago, so why
can't we?

I think it's time to wake up and smell the coffee.

-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.8-1-386
Locale: LANG=nb_NO, LC_CTYPE=nb_NO (charmap=ISO-8859-1)


--- End Message ---
--- Begin Message ---
UTF-8 is the default locale in installs as of etch, so I'll close this
bug.

For problems with individual programs (improper string handling, etc)
please file bugs on the respective packages.

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/

Attachment: signature.asc
Description: Digital signature


--- End Message ---

Reply to: