[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: gb <==> big5 conversion module



hi all (sorry for writing in GB2312)
     Anthony Fok说上次我列出来的那些字有一部分是在big5+范围之内的。:) 有些gb2312字符到big5有好多种写法,这种情况只有以词为单位转换才
能解决。我现在正在做分词的程序,还算顺利,找到一些相关论文,已经写出来一个原型。现在缺少一个gb2312<->big5的词组对应表,gb2312的分词
字典我现在用的是unicon-im里面带的词组,big5的字典在xcin里应该能找到。不过这些字典都没有词性 :(,没办法凑合用吧。我目前不打算在
autoconvert里面调用iconv,因为不是所有平台都用glibc的。 :) 字表还是统一起来比较好,呵呵,等你的结果了。

					Yu Guanghui

> -----Original Message-----
> From: Anthony Fok [mailto:foka@master.debian.org]On Behalf Of Yong Li
> Sent: Sunday, January 14, 2001 4:54 PM
> To: debian-chinese-big5@lists.debian.org
> Subject: gb <==> big5 conversion module
>
>
> Hello T.H.Hsieh and Yu Guanghui,
> I just came back yesterday from a long vacation and found that I missed
> both your posts regarding gb <==> big5 conversion.
>
> Before left for vacation, I was also working on writing a gb <==> big5
> gconv module. The first part of my plan was to establish a "best" mapping
> between gb and big5. I did not take any existing conversion table because
> none of them documented how they got their conversions and I don't feel
> comfortable with that. So I roll my own and took this opportunity to check a
> few popular gb <==> big5 converters. Most of this work has been finished.
> All the gb -> big5 conversions have been checked, but there some big5 -> gb
> conversions left. The result so far looks good. Compare with the table of
> 130+ unmapped gb codes posted by Yu Guanghui a while ago, 35 of them are
> mapped in my table. There are 4 codes not mapped in my table, but mapped in
> autoconvert. However I suspect that autoconvert made mistake in all 4 cases.
> I'll write a more detailed post describing my methodology, conversion
> table and the comparison results in next few days. Then I'd like to hear
> from you. If we all agree upon it, it's fairly easy to write the module.
> Hopefully it will be in time for 2.2.1 release which is said to be soon.
>
> Regards,
> Yong Li
> (rigel)
>
> --
> | This message was re-posted from debian-chinese-gb@lists.debian.org
> | and converted from gb2312 to big5 by an automatic gateway.
>
>



Reply to: