Re: Status of UTF-8 Debian changelogs

To: debian-policy@lists.debian.org
Subject: Re: Status of UTF-8 Debian changelogs
From: Steve Langasek <vorlon@netexpress.net>
Date: Fri, 6 Jun 2003 13:00:56 -0500
Message-id: <[🔎] 20030606180056.GM13286@tennyson.netexpress.net>
Mail-followup-to: Steve Langasek <vorlon@netexpress.net>, debian-policy@lists.debian.org
In-reply-to: <[🔎] 20030606163711.GI30760@seventeen>
References: <[🔎] 1054812938.3edf2b0ade210@imp.free.fr> <[🔎] 20030605122336.GG18490@prvidomaci.srce.hr> <[🔎] 1054898220.3ee0782ceec96@imp.free.fr> <[🔎] 20030606163711.GI30760@seventeen>

On Fri, Jun 06, 2003 at 06:37:11PM +0200, Bill Allombert wrote:
> On Fri, Jun 06, 2003 at 01:17:00PM +0200, Jérôme Marant wrote:
> > > I don't see all those (7|8)-bit-charset-using people requiring the
> > > same...

> >   Policy would mean all of them in the same charset, UTF-8 that is.

> The issue call for two comments:

> 1) Changelog are required to be written in english, so non 7bit
> characters should be rare, and use of non latin-1 characters are 
> probably not a good idea. For example, writing the name of a 
> developer with japanese characters might cause problem to people 
> reading the changelog understanding who is referred to. This is
> unfortunate.

> 2) People write changelog with whatever locales they use for development.
> Requiring them to use special tool for writing changelog would be a
> pain. I don't know how far lintian can check for UTF-8 encoding. 

Of course, these comments give contradictory rationales.  The one says
that mandating UTF-8 is bad because people shouldn't use non-ASCII
characters in changelogs; the other says that mandating UTF-8 is bad
because it makes it harder for people to use non-ASCII characters in
changelogs.  I argue that the latter is a *good* thing; and where
exceptions are permitted, they should be encoded using a common
character set.

Checking for non-UTF8 characters in a changelog is trivial.  Dump the
file through 'iconv -f utf-8 -t ucs-4', discard the output, and check
the return value.  If there are any characters in the stream which are
invalid UTF-8 sequences, iconv will exit with an error code; and this
will be the case for the vast majority of other character sets.

-- 
Steve Langasek
postmodern programmer

Attachment: pgpF7CPaYAa7m.pgp
Description: PGP signature

Reply to:

References:
- Status of UTF-8 Debian changelogs
  - From: Jérôme Marant <jerome.marant@free.fr>
- Re: Status of UTF-8 Debian changelogs
  - From: Josip Rodin <joy@srce.hr>
- Re: Status of UTF-8 Debian changelogs
  - From: Jérôme Marant <jerome.marant@free.fr>
- Re: Status of UTF-8 Debian changelogs
  - From: Bill Allombert <allomber@math.u-bordeaux.fr>

Prev by Date: Re: Status of UTF-8 Debian changelogs
Next by Date: Re: Status of UTF-8 Debian changelogs
Previous by thread: Re: Status of UTF-8 Debian changelogs
Next by thread: Re: Status of UTF-8 Debian changelogs
Index(es):
- Date
- Thread