Re: Release announcement simplified Chinese translation update

To: debian-www@lists.debian.org
Cc: debian-chinese-gb@lists.debian.org
Subject: Re: Release announcement simplified Chinese translation update
From: Deng Xiyue <manphiz-guest@users.alioth.debian.org>
Date: Thu, 19 Feb 2009 10:07:28 +0800
Message-id: <[🔎] 87y6w3rxgf.fsf@users.alioth.debian.org>
References: <499844F5.6030802@debian.org> <[🔎] 20090216010302.GF1723@ftbfs.org> <[🔎] 87vdraddvi.fsf@gmail.com> <[🔎] 49996DDA.2090100@linux.org.tw> <[🔎] 20090216142828.GA20558@ftbfs.org> <[🔎] 49998D1C.2010304@linux.org.tw> <[🔎] 20090216163311.GA4739@ftbfs.org> <[🔎] 499A3741.3030602@linux.org.tw> <[🔎] 46688e190902171043p3738e31sf2ffb71547e18cc8@mail.gmail.com> <[🔎] 20090218151003.GA3946@debian> <[🔎] 4b3406f0902181706v17607e39rddfc5fc7101f4920@mail.gmail.com>

Dongsheng Song <dongsheng.song@gmail.com> writes:

> 2009/2/18 Vern Sun <s5unty@gmail.com>:
>> on 三, 2009-02-18 at 02:43 +0800, Anthony Wong wrote:
>>> I suggest 1. to convert all existing Chinese WML files for the Debian website
>>> from Big5 to UTF-8
>>>
>>> Any comments?
>>>
>> 如果全部转换成 UTF-8 格式可能会存在问题，假设有两个用户(一个简体，一个繁体)都
>> 贡献了一个翻译:
>>
>> % cat foo.tc
>> 中國
>>
>> % cat foo.sc
>> 中国
>>
>> % enca foo.sc foo.tc
>> foo.sc: Universal transformation format 8 bits; UTF-8
>> foo.tc: Universal transformation format 8 bits; UTF-8
>>
>> 把简体用户贡献的翻译从 UTF-8 转到 GB2312 是正常的
>> ~% iconv -f utf8 -t gb2312 foo.sc > foo.sc.gb
>>
>> 但是把繁体用户贡献的翻译从 UTF-8 转到 GB2312 是错误的
>> ~% iconv -f utf8 -t gb2312 foo.tc > foo.tc.gb
>> iconv: illegal input sequence at position 3
>>
>> 同理，把简体用户贡献的翻译从 UTF-8 转到 BIG5 也是错误的
>> ~% iconv -f utf8 -t big5   foo.sc > foo.sc.big
>> iconv: illegal input sequence at position 3
>>
>> ~% iconv -f utf8 -t big5   foo.tc > foo.tc.big
>>
>
> 我不明白，为什么还死抱着 GB2312/Big5 不放手，直接使用 UTF-8 不好吗？
> sc <=> tc 应该只转换内容，不应该多此一举的转换到过时的编码。
>
> ---
> Dongsheng Song

Vern Sun 的考虑是：如果使用 GB2312/BIG5 编码，是可以直接知道该文档编码是
简体或者繁体，进而知道是否需要先进行简繁转换再编码转换。而 UTF-8 本身是
两者都可以同时存在的。

不过，使用 UTF-8 应该仍然可以知道当前的汉字是简体还是繁体，而且可以省去编
码转换步骤，所以应该不成问题。

Moreover, when posting to international mailing lists, please prefer
using English so that non-Chinese speaker can follow.

To non-Chinese speakers: the above discussion is a concern about using
UTF-8 might lead to premature encoding conversion where converting
Traditional Chinese characters to GB2312 will result in failure, as
UTF-8 can hold both character sets.  This is a valid concern, but
shouldn't be a problem if checking the character set is done before
hand, and if possible, there will be no need to do encoding conversions
as UTF-8 can handle both character sets uniformly.

Regards,
Deng Xiyue

Reply to:

References:
- Re: Release announcement simplified Chinese translation update
  - From: Matt Kraai <kraai@ftbfs.org>
- Re: Release announcement simplified Chinese translation update
  - From: Deng Xiyue <manphiz-guest@users.alioth.debian.org>
- Re: Release announcement simplified Chinese translation update
  - From: Arne Goetje <arne@linux.org.tw>
- Re: Release announcement simplified Chinese translation update
  - From: Matt Kraai <kraai@ftbfs.org>
- Re: Release announcement simplified Chinese translation update
  - From: Arne Goetje <arne@linux.org.tw>
- Re: Release announcement simplified Chinese translation update
  - From: Matt Kraai <kraai@ftbfs.org>
- Re: Release announcement simplified Chinese translation update
  - From: Arne Goetje <arne@linux.org.tw>
- Re: Release announcement simplified Chinese translation update
  - From: Anthony Wong <ypwong@gmail.com>
- Re: Release announcement simplified Chinese translation update
  - From: Vern Sun <s5unty@gmail.com>
- Re: Release announcement simplified Chinese translation update
  - From: Dongsheng Song <dongsheng.song@gmail.com>

Prev by Date: Re: Call for translations: Release announcement
Next by Date: Re: Call for translations: Release announcement
Previous by thread: Re: Release announcement simplified Chinese translation update
Next by thread: error in link!
Index(es):
- Date
- Thread