[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Fwd^2: Re: rendering unicode han



yes, I understand that the rendering needs to be localized, but I
believe that is not at all out of keeping with the unicode design
philosophy. for instance if I were to have a list of words taken from
spanish, french and english, I would not be able to sort any of them
in language-specific order because each language induces its own
collation order. it seems to me that your friend is still talking
about glyphic variants, and that glyphic variants are not grounds for
breaking a character into 2.

Unicode has been insistent from the beginning on _not_ encoding
information about specific languages. languages change far more
frequently than characters. What if next year the chinese variant of
U+516B becomes popular for use in vietnam? should the character set be
rewritten to accomodate? No, clearly the language has changed and not
the character. You need to indicate in your locale settings (including
choice of font) which variants you want to use.

To put it another way -- imagine your theoretical article with
chinese, japanese, korean and vietnamese being displayed in it. lets
say you're a japanese person reading it. You use a "japanese" locale,
which includes loading a font made by a japanese foundry, which
supports only the japanese glyph variants. there are 2 possibilities:

(1) you can, through some amazing miracle of language training, read
all 4 languages in their native orthography. You grew up with the
japanese variant of U+516B, so when you see it occur in the middle of
the vietnamese block of text you see it as just a "japanese-friendly
font rendition" of an obviously vietnamese character, make a
reasonable assumption that the author meant the vietnamese variant
which just doesn't happen to exist in your japan-made font, and carry on.

(2) you can't read the other 3 languages anyway. who cares what glyphs 
they use? you can read U+516B in the japanese portions of the text,
which is the only part you understand anyway.

I really don't think this is a CJKV-specific problem. the same thing
will happen to me if I find myself in germany writing email in english
using a german-localized email client, and I type 2 consecutive "s"
characters in, I'm not going to be terribly surprised when it forms a
ligature. One which my friends who have never read german will think,
at a glance, is a capital "B". It's a hazard of localization,
imo. unicode is not intended to create an environment in which
everyone can magically understand each other. merely one in which
pairs of people speaking the same language can understand each other
without having to use specialized versions of the software, and in
which automatic tools like grep and sed have some hope of being able
to work right without knowing which language they're scanning.

-graydon


--  
To UNSUBSCRIBE, email to design-request@berlin-consortium.org
with a subject of "unsubscribe". Trouble? Contact listmaster@berlin-consortium.org



-- 
[ This mail was originally sent to debian-chinese@lists.debian.org and was ]
[ forwarded to this list automatically. Big5 characters are also converted ]
[ to GB at the same time,  Please note that there may be errors during the ]
[ conversion as this is not done by a human!                               ]


Reply to: