[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Release announcement simplified Chinese translation update

Dongsheng Song <dongsheng.song@gmail.com> writes:

> 2009/2/18 Vern Sun <s5unty@gmail.com>:
>> on 三, 2009-02-18 at 02:43 +0800, Anthony Wong wrote:
>>> I suggest 1. to convert all existing Chinese WML files for the Debian website
>>> from Big5 to UTF-8
>>> Any comments?
>> 如果全部转换成 UTF-8 格式可能会存在问题,假设有两个用户(一个简体,一个繁体)都
>> 贡献了一个翻译:
>> % cat foo.tc
>> 中國
>> % cat foo.sc
>> 中国
>> % enca foo.sc foo.tc
>> foo.sc: Universal transformation format 8 bits; UTF-8
>> foo.tc: Universal transformation format 8 bits; UTF-8
>> 把简体用户贡献的翻译从 UTF-8 转到 GB2312 是正常的
>> ~% iconv -f utf8 -t gb2312 foo.sc > foo.sc.gb
>> 但是把繁体用户贡献的翻译从 UTF-8 转到 GB2312 是错误的
>> ~% iconv -f utf8 -t gb2312 foo.tc > foo.tc.gb
>> iconv: illegal input sequence at position 3
>> 同理,把简体用户贡献的翻译从 UTF-8 转到 BIG5 也是错误的
>> ~% iconv -f utf8 -t big5   foo.sc > foo.sc.big
>> iconv: illegal input sequence at position 3
>> ~% iconv -f utf8 -t big5   foo.tc > foo.tc.big
> 我不明白,为什么还死抱着 GB2312/Big5 不放手,直接使用 UTF-8 不好吗?
> sc <=> tc 应该只转换内容,不应该多此一举的转换到过时的编码。
> ---
> Dongsheng Song

Vern Sun 的考虑是:如果使用 GB2312/BIG5 编码,是可以直接知道该文档编码是
简体或者繁体,进而知道是否需要先进行简繁转换再编码转换。而 UTF-8 本身是

不过,使用 UTF-8 应该仍然可以知道当前的汉字是简体还是繁体,而且可以省去编

Moreover, when posting to international mailing lists, please prefer
using English so that non-Chinese speaker can follow.

To non-Chinese speakers: the above discussion is a concern about using
UTF-8 might lead to premature encoding conversion where converting
Traditional Chinese characters to GB2312 will result in failure, as
UTF-8 can hold both character sets.  This is a valid concern, but
shouldn't be a problem if checking the character set is done before
hand, and if possible, there will be no need to do encoding conversions
as UTF-8 can handle both character sets uniformly.

Deng Xiyue

Reply to: