[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UTF-8 and changelog



This one time, at band camp, Joel Baker said:
> US-ASCII only defines characters from 0x00 through 0x7F (0 - 127); it is a
> formal subset of both ISO-8859-1 (Latin-1) and UTF-8. Or, more precisely,
> both Latin-1 and UTF-8 are proper supersets of US-ASCII, largely to prevent
> being gratuitously backwards-incompatible with the standard that has been
> used for decades.

This was what I had thought, mostly from osmosis, rather than any real
research.

> Thus, unless you're using "high characters" not defined in US-ASCII, all
> of the following three statements are true:
> 
> 1) It is a valid US-ASCII file
> 2) It is a valid ISO-8859-1 file
> 3) It is a valid UTF-8 file
> 
> It's only once you get into characters not found in US-ASCII that things
> differ. So, unless and until you add any, you don't need to worry about
> conversions (and, at that point, you should just add them as UTF-8
> characters, and not worry about Latin-1 at all :)
> 
> FWIW, you can use the 'en_US.UTF-8' locale if you want to see everything in
> Unicode. However, at least on woody, some applications won't cope with this
> well (many of them have newer versions in unstable that cope just fine,
> though).

Thanks for filling in the blanks, and at least giving me enough to
google intelligently from here.  The changelog issue is at rest, but I
would like to know more about this for the future.

Thanks again,
-- 
 -----------------------------------------------------------------
|   ,''`.					     Stephen Gran |
|  : :' :					 sgran@debian.org |
|  `. `'			Debian user, admin, and developer |
|    `-					    http://www.debian.org |
 -----------------------------------------------------------------

Attachment: pgpF1yWdFU90M.pgp
Description: PGP signature


Reply to: