Bug#99933: Bug#99324: Default charset should be UTF-8

On Mon, Jun 11, 2001 at 12:34:40PM -0400, Raul Miller wrote:
> On Mon, Jun 11, 2001 at 04:47:18PM +0200, Radovan Garabik wrote:
> > The situation is IMHO quite similar to german for using Fraktur
> > (S?tterlin) script - it is a latin script, and unicode consortium
> > (IMHO rightfully) decided that it is a typesetting difference - not an
> > encoding one (you can - and sometimes you do - typeset english text
> > using Fraktur fonts, after all). If Germans were using it still today,
> > you would have exactly the same problems as with CJK scripts now (of
> > course, the complexity of CJK is much greater than that of a latin
> > scripts)
> I disagree.  The Han Unification issue is more like the difference
> between the latin and the italic character sets.  Yes, many characters

No, because latin (upright) and italics are used interchangebly,
whereas fraktur carries implicit connotation of language used -
just like different glyphs for unified CJK charset.

> are similar, however there are also some characters which are unique to
> each representaiton.
> Also, Unicode does include Fraktur characters.

but in mathematical symbols - that is a completely different beast

> > I am really not sure if unicode went the right way, I feel the ability
> > to display Chinese name in a Japanese document using Chinese glyphs
> > (or vice versa) is something that should not be get rid of... 
> And, this could be rectified -- with Unicode 3.1, they have the code
> space to represent each major representation of the character set.

if only they instead of talking how bad is unicode started working
on improving it (duck, run :-))

> > perhaps it should consider them to be different scripts with different
> > encodings, but  when would it stop? Making italics, boldface etc. to be
> > different characters?
> Unicode already does that.  Take a look at the mathematical alphanumeric
> symbols [1D400-1D744].  For example:

the reason and purpose of these characters is quite different 
from "base" unicode characters

> > As for X11, fonts are being rapidly developped.
> For currently relevant policy it matters what actually works.

of course. That's why my proposal is very mildly worded and
gives a lot of freedom to maintainers to decide what charset they

> > > 
> > >  "Package may (at the discretion of the maintainer) include
> > >   documentation files in other encodings, if they are present also in
> > >   canonical encoding, and if the encodings used are clearly marked. 
> > >   If a particular font is required, that should be clearly marked."
> > 
> > You do not know what is a particular font... one of 
> > (traditional|simplified)C,J,K, or the full font name?
> I'm not sure I understand this question (I don't know enough about
> oriental languages and fonts to give a full answer in any event).

well, would you indicate just "this README needs japanese unicode font"
and the user has to figure out by himself what is that
or "this README needs -misc-fixed-*-*-*-ja-*-*-*-*-*-*-iso10646-1"
and the user is fubar when he does not have that font.

> > More appropriate example from the history is the war between EBDIC,
> > ASCII and other proprietary encodings... thanks god one and only one
> > encoding won.
> ebdic vs. ascii wasn't about supported languages.

true, but the mess in encodings was quite comparable to what
is there today outside of Latin-1 world.
And the peace ASCII brought could be compared
to peace that (hopefully :-)) unicode brings one day.

> > > I agree that, except for the oriental languages and legacy systems,
> > > unicode is just about perfect in its ability to represent scripts in
> > > many languages.
> > 
> > and that is something terribly needed today, with this
> > world wired together.
> I agree.
> However, Unicode is not a mature standard, so we need to be careful in
> places where it would cause problems.

Of course. Nobody is talking about compulsory switching 
to unicode _right now_.

