[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

FORW: Re: Is I18n 'wording' correct in Netscape?

Found this <Pine.HPP.3.95a.990614142113.13744D-100000@hpplus01.cern.ch> in netscape.public.mozilla.i18n:

== BEGIN forwarded message ==

From: "Alan J. Flavell" <flavell@mail.cern.ch>
Newsgroups: netscape.public.mozilla.i18n
Subject: Re: Is I18n 'wording' correct in Netscape?
Date: Mon, 14 Jun 1999 15:14:22 +0200

On Fri, 11 Jun 1999, Paul Gorodyansky wrote:

Hallo Paul, my attention was drawn to this article by deja(news),
although I don't usually read this group.  Excuse me if I say anything
out of place...

Terminology is notoriously uncertain in this area (as your cited page at
the W3C also points out), and the terms used in one context (e.g HTML or
SGML) may have quite a different meaning than when they are used in
another context (e.g Operating System or Word Processing).  The
confusion of character sets with fonts only makes matters worse. 

> But as far as I remember from my early days of I18n
> experience when I still was reading manuals and books :)
> a Character Set is just a set of the letters of a national
> alphabet, and then different Encoding methods assign
> different code points for these letters.

I think you'll find that in this kind of context, the collection of
characters without reference to its arrangement is called a
"repertoire", e.g Latin-1. 

In an SGML context, the term "Document Character Set" has a very
definite meaning, and HTML specifications state that the only Document
Character Set in HTML is Unicode.  Specifically, this fixes the meaning
of &#number; representations in HTML.  (Earlier versions of HTML, of
course, used iso-8859-1, but this is a subset of Unicode anyway).  The
document character set, in this sense, is not open to any kind of
negotiation or announcement between the server and the client, it is
built-in to the definition of HTML.

I'd say that a "character set" is an arrangement of a repertoire, for
example iso-8859-1 and cp850 are two different character sets based on
the Latin-1 repertoire. 

In order to transmit or store unicode characters, it's necessary to
choose a coding (a.k.a "encoding", but this term risks confusion with a
different protocol layer, such as encoding with base-64, binhex,
uuencoding etc.). In the case of unicode, one can choose codings utf-8,
ucs-2 etc. These codings are denoted in MIME terminology by the
unfortunately-named "charset" attribute.

With iso-8859-* etc., the normal transmission coding is nothing more
than an 8-bit byte having the same value as the position of the
character in its character set: this is the cause of the common
confusion of the terms "character set" and "coding" in relation to these
codes.  But in unicode (for example) the need to preserve this
distinction in terminology is evident.

> Am I correct? If so, then it should have been vice versa -
> 'Character Set' in Edit/Preferences/Appearance/Fonts
> and 'Encodings' in the View menu.
> Or, because a usage of both 'Character Set' term and
> 'Encoding' term is kind of confusing, just use 'Encoding'
> in _both_ places as it was done before 4.5.
> Does it make sense?

I suspect that the problem here is that the one menu is relating
to HTML and HTTP (of course, in a properly-managed WWW, this part
should not need to be exposed to the user at all, but would work
entirely automatically based on the correct advertisements that all
web servers were sending - but the reality is otherwise, as we all
know); whereas the other menu is related to something in the Operating
System, that's being used for rendering the HTML documents (i.e font
assignments) but that has no direct relationship to HTML concepts.

The View menu item should in my opinion be called "Coding" or "Character
code" or some variation on that. 

> (I have the following links to the pages that discuss this
>  terminology:
>  a) "W3C: Internationalization / Localization":
>     http://www.w3.org/International/O-charset.html

I think what I've said above is consistent with that discussion, albeit
there are some subtle niceties that aren't being tackled here at all.

>  I ran into this 'wording' issue while updating my
>  instructions for  Cyrillic in both Netscape 4.0x and 4.5+ -
>  I had there, "... you need to select a Cyrillic
>  Encoding using a View item in the menu..." and now I see
>  that I need to use a different wording for 2,3,4.0x and 4.5+,
>  because in 4.5+ it's "Character Set" ...

Since there are, for example, two different entries for
"Unicode(utf-7)" and "Unicode(utf-8)", which are two different codings
of the same character set (in the sense in which I understand the
terms), it's clear to me that calling this menu item "Character set" is
not really correct. 

As for the Edit/Preferences/Appearance/Fonts, I don't really understand
the underlying technology of how Netscape 4 uses Windows features to get
(in most cases) such poor support for rfc2070 / i18n.  The terms listed
in the "Encoding" menu don't look like character codings, they look like
descriptions of repertoires to me (I would not object strongly to their
being termed "character sets" if that was thought to be more familiar to

It looks as if Moz.5 continues the terminology of NS4.5 etc.: in the
View/Default Character Set menu one is offered a list of different
codings, whereas in Edit/Preferences/ Appearance/Fonts the label says
"for the Encoding", and the selection list contains the name "Western",
which I'd assume is a euphemism for Latin-1, the name of a repertoire. 

Thank goodness, however, that i18n seems to have been implemented
so much better here.  It's only the terminology that I'm commenting on.

So I agree with you, the terminology used in the user interface
seems to be inappropriate.

best regards

== END forwarded message ==

Reply to: