Bug#99324: Default charset should be UTF-8

To: 99324@bugs.debian.org
Subject: Bug#99324: Default charset should be UTF-8
From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
Date: Mon, 11 Jun 2001 10:41:13 +0200
Message-id: <[🔎] 20010611104113.A15114@melkor.dnp.fmph.uniba.sk>
Reply-to: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>, 99324@bugs.debian.org

Raul Miller <moth@debian.org> wrote:

> > Converting all documentation to utf-8 is ridiculous, and unnecessary.
> 
> Do you think the currently proposed policy (documentation should be

Please read the proposal carefully (especially Marco and Junichi).
Writting (converting into) documents in UTF-8 is "should"
(and IMHO should be "must" but debian maintainers seem not to be
ready for that yet)
only for debian control files and English language documentation
(if any non-english characters occur there).
For documentation in other languages, it is merely an encouragement.

> written in unicode, packages should use the same encoding for all its
> documentation) is in some sense bad?  If so, could you suggest a better
> phrasing?

Well, there is one issue I thought of... package can include
documentation in different encodings (such as README.koi8, README.ascii,
README.alt). This should be allowed. Perhaps the sentence
"Package may (at the discretion of the maintainer) include documentation 
files in other encodings, if they are present also in canonical encoding,
and if the encodings used are clearly marked" 
should be added to the proposal?

Jürgen A. Erhard <juergen.erhard@gmx.net> wrote:

>     >*Addition to 13.5 Preferred documentation formats:
>     >
>     >HTML documents, if in encoding other than us-ascii, must
>     >have in their header an appropriate META tag describing the used encoding.
> 
> Shouldn't that be "iso-8859-1 (latin1)" instead of "us-ascii"?  As,
> IIRC, that is the official default encoding for HTML (according to
> RFC2854/RFC2616).

It used to be, in HTML 2.0, but HTML 4.0 says it is ISO 10646 (but
does not tell if UTF-8 or UTF-16 or even directly UCS-4)

> I'm unsure about the must/should, though... I mean, "should" should
> also be stuff that's not really critical, right?  But a document that
> I can't even get any reader to read due to not knowing which encoding
> it's in (BTW, I liked your story about the ECMA-cyrillic doc ;-) is
> not non-critical.  But IANADD anyway...

I originally wanted "must" but it generated such an uproar (although 
generated mostly by a few louder developers :-)) that I retracted it.

Marco d'Itri <md@Linux.IT> wrote:

>  >TELL ME HOW IN THE HELL I CAN WRITE A MAIL WITH WORDS FROM
>  >HUNGARIAN, SLOVAK, RUSSIAN AN JAPANESE TOGETHER!!!!
> You and him configure your MUAs to use some unicode encoding and deal
> with any resulting problem which may happen.
> No need to force it on everybody else.

sorry to spoil your day, but this is already being forced by someone
with much greater authority than me (http://www.imc.org/mail-i18n.html)
"""
   Recommendation: All mail-creating programs created or revised after January 1, 1999, must
   be able to create mail using the UTF-8 charset. Another way to say this is that any
   program created or revised after January 1, 1999, that cannot create mail using the UTF-8
   charset should be considered deficient and lacking in standard internationalization
   capabilities. Of course, all mail-creating programs should try to meet this requirement as
   early as possible.
"""

>  >Unicode was not panacea, but it solved most of the problems,
>  >although setting it up was not painless.
> Good. Maybe you should consider writing the Unicode-HOWTO, so other
> people will be able to switch more easily to unicode.

http://www.linuxdoc.org/HOWTO/Unicode-HOWTO.html

>  >and _you_ want to continue with status quo.
> No, I want to leave people and national communities the freedom to chose
> what is best for them.

I would not comment about the situation in Japan, since
I obviously know nothing about it (although common sense says
it is better to have one encoding instead of several incompatible ones),
but I can comment on the situation for Slovak and Russian, and
believe me, being able to use unicode would be a godsend.

>  >Granted, unicode might not be ready for Japanese.
>  >But, should we wait until it is ready?
> Yes. I have no desire to suffer because you consider more elegant
> switching everything to unicode right now.

1) nobody forces you _right now_
2) nobody forces you to switch everything _neither now nor in the future_
3) do you have _any_ 8-bit character in your packages?
   No, you haven't. So even if we switch to unicode description
   and documentation _right now and unconditionally_, you do not
   need to move your fingers in the slightest way. So do
   not talk about your personal suffering.
4) if you talk about your precious ISO-8859-1 console,
be happy debian came preconfigured with it, so you did not have
much work by setting it up.
I guess if the preconfigured software were immediately usable by
more people in more languages, it would be an improvement.

>  >that everybody has the same luck and does not feel the problem.
> The fact _you_ have some problems does not mean I should suffer because
> of them.

the fact that italians (and others) had problems with accented characters
does not mean americans should suffer and convert from 7-bit ascii
to 8-bit charset. Yet, they did.

-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Reply to:

Follow-Ups:
- Bug#99324: Default charset should be UTF-8
  - From: Raul Miller <moth@debian.org>

Prev by Date: Bug#100472: [PROPOSAL] allowing '-' between libraryname and soversion
Next by Date: Bug#100472: PROPOSAL] allowing '-' between libraryname and soversion
Previous by thread: Bug#99324: Default charset should be UTF-8
Next by thread: Bug#99324: Default charset should be UTF-8
Index(es):
- Date
- Thread