[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: charsets in debian/control



On Mon, Dec 06, 2004 at 06:58:10PM +0000, Thaddeus H. Black wrote:
> I would not disagree with Peter or Daniel.  They are
> right in my view.  However, consider the following
> Unicode characters:

>   025A LATIN SMALL LETTER SCHWA WITH HOOK
>   025E LATIN SMALL LETTER CLOSED REVERSED OPEN E
>   0261 LATIN SMALL LETTER SCRIPT G
>   0264 LATIN SMALL LETTER RAMS HORN
>   0267 LATIN SMALL LETTER HENG WITH HOOK
>   027A LATIN SMALL LETTER TURNED R WITH LONG LEG
>   027F LATIN SMALL LETTER REVERSED R WITH FISHHOOK
>   0285 LATIN SMALL LETTER SQUAT REVERSED ESH
>   0295 LATIN LETTER PHARYNGEAL VOICED FRICATIVE
>   02A2 LATIN LETTER REVERSED GLOTTAL STOP WITH STROKE
>   FF21 FULLWIDTH LATIN CAPITAL LETTER A

> We are not speaking of a stricken Polish L, a
> double-accented Magyar O, or a euro sign.

Indeed we're not; most of the letters you listed here are specific to the
IPA, which would have no use at all in a control file as they're not part of
the writing system of any natural language.

Encodings and charsets are distinct concepts.  Just because the file is
specified in UTF-8 *encoding* does not mean we suddenly have to start coping
with the entire Unicode character set.  OTOH, the Unicode charset is also
the only one we have that is a superset of iso8859-1, iso8859-2, and
iso8859-15, so if you want to be able to *use* the ł, the €, and the ő in
the same file together with ñ and ê, the only sensible way to do so is to
specify a UTF-8 encoding.

> We are speaking of... well, to tell the truth I have no idea what these
> letters are.  Have you?  More to the point, should you and I learn to
> recognize such letters?  Should we expect basic Latin terminal fonts to
> cover them?  Is it reasonable to marginalize the ?'s and ?'s of Latin-1
> by lumping them with the "squat reversed esh"?

Why, what a lovely straw man you have there.

But yes, non-ASCII Latin-1 chars should not be given special status over
the national chars found in other languages spoken by project members.
Debian should be using either ASCII, or Unicode; standardizing on Latin-1
makes no sense in a global project.

> In my view, a terminal which cannot correctly display
> the "?" is somewhat broken, and a user who does not
> recognize the "?" probably should learn.  I would not
> say the same with respect to the "squat reversed esh".
> However, this is just my view.

Mmm-hmm.

> Content-Type: text/plain; charset=unknown-8bit

Your opinion about which charset to use for Debian files would carry more
weight with me if you had enough experience with such things to properly
declare the character set on the non-ASCII mails you send.

-- 
Steve Langasek
postmodern programmer

Attachment: signature.asc
Description: Digital signature


Reply to: