This one time, at band camp, Joel Baker said: > US-ASCII only defines characters from 0x00 through 0x7F (0 - 127); it is a > formal subset of both ISO-8859-1 (Latin-1) and UTF-8. Or, more precisely, > both Latin-1 and UTF-8 are proper supersets of US-ASCII, largely to prevent > being gratuitously backwards-incompatible with the standard that has been > used for decades. This was what I had thought, mostly from osmosis, rather than any real research. > Thus, unless you're using "high characters" not defined in US-ASCII, all > of the following three statements are true: > > 1) It is a valid US-ASCII file > 2) It is a valid ISO-8859-1 file > 3) It is a valid UTF-8 file > > It's only once you get into characters not found in US-ASCII that things > differ. So, unless and until you add any, you don't need to worry about > conversions (and, at that point, you should just add them as UTF-8 > characters, and not worry about Latin-1 at all :) > > FWIW, you can use the 'en_US.UTF-8' locale if you want to see everything in > Unicode. However, at least on woody, some applications won't cope with this > well (many of them have newer versions in unstable that cope just fine, > though). Thanks for filling in the blanks, and at least giving me enough to google intelligently from here. The changelog issue is at rest, but I would like to know more about this for the future. Thanks again, -- ----------------------------------------------------------------- | ,''`. Stephen Gran | | : :' : sgran@debian.org | | `. `' Debian user, admin, and developer | | `- http://www.debian.org | -----------------------------------------------------------------
Attachment:
pgpF1yWdFU90M.pgp
Description: PGP signature