[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: PinYin Standard

Hash: SHA1

> Here is the wc on pinyin.cin I thinkI can first base on this to build a
> prototype.
> 20976 line include multiple chars input...  so total singal char to
> pinyin don't
> really map all the gbk. Also the standard pronouncation of the word
> should include
> the tone 1-6 for cantonese 1-4 for mandarin...
> ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt
> in this file you get more detail how it should look like....
> in pinyin.cin... may not be enought for my use... of course better then
> nothing :)
> Thanks alot of your input

AFAIK there is no database which contains all chinese chars... they are too 
many and mostly not used at all. big5 and gb2312 contain the most common 
ones and for daily life it's sufficient. Cantonese should only be availible 
on the HKSCS extensions in the Unihan database, although they also use big5 
and nowadays gb chars but have different pronounciation for them. 

you also should keep in mind that the tone (1-5 for mandarin) sometimes 
varies on the same character in relation to the character in front and 
behind of it.

and if you wnat to include minority dialects and laguages, too, what about 
taiwanese, hakka, etc. ?

If you want to do that you really need a real dictionary.
- -- 
Arne Goetje <20020531antispam@gmx.net> 
(Spam catcher.  Address might change in future!)
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F  1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.
Version: GnuPG v1.2.1 (GNU/Linux)


To UNSUBSCRIBE, email to debian-chinese-big5-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

| This message was re-posted from debian-chinese-big5@lists.debian.org
| and converted from big5 to gb2312 by an automatic gateway.

Reply to: