[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: which locale



On 8/14/06, Mathias Brodala <info@noctus.net> wrote:
Hello Wei Hu.

> Actually I have no filename problems when I mount NTFS/VFAT
> partitions. But I can NOT properly display some .torrent files which I
> downloaded from the Internet. It may use non-utf8 codes such as
> gb18030 or gb2812 code.

I can not imagine how such a thing happened, but I can not rule it out too.

> Can't Unicode handles non-Unicode filenames?

There are no „non–Unicode filenames", because Unicode is everything. Almost every
known character in the world exists in the Unicode table. The UTF–8 encoding is only
one way of using it. (Others would be UTF–16, UTF–32 and even the limited ISO-*
charsets.)

So, if your filesystem is really encoded with UTF–8 then there should be no problem to
display the strangest characters.

There are non-Unicode filenames. Yes, Unicode has a goal of being able
to represent every character, but that does not mean that all file
names use Unicode. Of course, the UTF-8 encoding of Unicode is
backwards compatible with ASCII, so an ASCII filename should display
normally on a UTF-8 system. But gb18030 or gb2812 encoded filenames
will not work properly on a UTF-8 system (ISO-8859-1 filenames would
work if they happened to be restricted to the ASCII-compatible subset)
. I have had this problem with things I have downloaded from Japanese
sources and they were encoded in Shift-JIS or another Japanese
encoding. The computer does not automatically convert other encodings
to UTF-8 equivalent when saving.

It would be correct to say that their are no filenames that cannot be
*represented* with Unicode.

PS It appears that GB18030 is a Unicode encoding and like UTF-8 it is
backwards compatible with ASCII, although beyond that it differs from
UTF-8. And GB2812 seems to be a character mapping, like Unicode, and
the EUC encoding of it is ASCII compatible. You learn something new
every day.

Cheers,
Kelly

Reply to: