[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 and changelog



On Mon, Aug 04, 2003 at 01:15:07PM -0400, Stephen Gran wrote:
> Hello all,
> 
> Just a quick question about encoding changelog in utf-8.  My normal
> locale is iso-8859-1 (en_US or so, I guess), and `file changelog`
> returns 'ASCII text'.  I tried 
> `iconv -f ISO-8859-1 -t utf8 changelog -o changelog.new`, but then 
> `file changelog.new` returns 'ASCII text' again, and diff shows no
> difference.  Do I need to be doing this each time, or can I leave it be?
> 
> As you can probably tell, I am not that familiar with the issues around
> utf-8, but my impression was that it is a superset of ASCII, so if I
> only use ASCII characters, it should be fine.  I checked with the line
> from developers-reference (footnote 76, IIRC) and got an exit code of 0,
> but since I am not sure about this kind of thing, I thought I had better
> ask.

US-ASCII only defines characters from 0x00 through 0x7F (0 - 127); it is a
formal subset of both ISO-8859-1 (Latin-1) and UTF-8. Or, more precisely,
both Latin-1 and UTF-8 are proper supersets of US-ASCII, largely to prevent
being gratuitously backwards-incompatible with the standard that has been
used for decades.

Thus, unless you're using "high characters" not defined in US-ASCII, all
of the following three statements are true:

1) It is a valid US-ASCII file
2) It is a valid ISO-8859-1 file
3) It is a valid UTF-8 file

It's only once you get into characters not found in US-ASCII that things
differ. So, unless and until you add any, you don't need to worry about
conversions (and, at that point, you should just add them as UTF-8
characters, and not worry about Latin-1 at all :)

FWIW, you can use the 'en_US.UTF-8' locale if you want to see everything in
Unicode. However, at least on woody, some applications won't cope with this
well (many of them have newer versions in unstable that cope just fine,
though).
-- 
Joel Baker <fenton@debian.org>

Attachment: pgpoNWWu_Xi9t.pgp
Description: PGP signature


Reply to: