[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status of UTF-8 Debian changelogs

On Fri, Jun 06, 2003 at 06:37:11PM +0200, Bill Allombert wrote:
> On Fri, Jun 06, 2003 at 01:17:00PM +0200, Jérôme Marant wrote:
> > > I don't see all those (7|8)-bit-charset-using people requiring the
> > > same...

> >   Policy would mean all of them in the same charset, UTF-8 that is.

> The issue call for two comments:

> 1) Changelog are required to be written in english, so non 7bit
> characters should be rare, and use of non latin-1 characters are 
> probably not a good idea. For example, writing the name of a 
> developer with japanese characters might cause problem to people 
> reading the changelog understanding who is referred to. This is
> unfortunate.

> 2) People write changelog with whatever locales they use for development.
> Requiring them to use special tool for writing changelog would be a
> pain. I don't know how far lintian can check for UTF-8 encoding. 

Of course, these comments give contradictory rationales.  The one says
that mandating UTF-8 is bad because people shouldn't use non-ASCII
characters in changelogs; the other says that mandating UTF-8 is bad
because it makes it harder for people to use non-ASCII characters in
changelogs.  I argue that the latter is a *good* thing; and where
exceptions are permitted, they should be encoded using a common
character set.

Checking for non-UTF8 characters in a changelog is trivial.  Dump the
file through 'iconv -f utf-8 -t ucs-4', discard the output, and check
the return value.  If there are any characters in the stream which are
invalid UTF-8 sequences, iconv will exit with an error code; and this
will be the case for the vast majority of other character sets.

Steve Langasek
postmodern programmer

Attachment: pgpF7CPaYAa7m.pgp
Description: PGP signature

Reply to: