Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM

To: debian-chinese-gb@lists.debian.org
Subject: Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
From: Deng Xiyue <manphiz-guest@users.alioth.debian.org>
Date: Mon, 05 May 2008 14:55:53 +0800
Message-id: <[🔎] 1209970553.5083.8.camel@localhost>
In-reply-to: <[🔎] f893e6e20805042304q3096420aq5646152bd77bba1b@mail.gmail.com>
References: <[🔎] 20080501094548.GA9657@tchaikov> <[🔎] 20080505051017.GA8863@ibook> <[🔎] f893e6e20805042304q3096420aq5646152bd77bba1b@mail.gmail.com>

在 2008-05-05一的 14:04 +0800，Kov Chai写道：
> 
> 
> 2008/5/5 ZhengPeng Hou <zhengpeng.hou@gmail.com>:
>         --12:25:37--
>         http://mentors.debian.net/debian/pool/main/s/sunpinyin/sunpinyin_1.0.orig.tar.gz
>                   => `sunpinyin_1.0.orig.tar.gz'
>                   Resolving mentors.debian.net... 64.79.197.109
>                   Connecting to mentors.debian.net|
>         64.79.197.109|:80...
>                   connected.
>                   HTTP request sent, awaiting response... 200 OK
>                   Length: 39,363,185 (38M) [application/x-gzip]
>         38M? 都是什么啊
>  
> 38M 是有点大。主要是 data 目录下的四个文件。 lm_sc.t3g.{sparc,i386}
> (6727K*2), pydict_sc.bin.{sparc,i386} (23M*2)。前者是线索化后的语言模
> 型数据[1]，目的是加速查找速度和压缩数据，差不多就是建立索引的效果，有
> 了它能较快地计算 n-gram 语言模型里一串单字 (S = {W_1,W_2, W_3, ...,
> W_n}) 成为该语言模型里一个句子的概率 P(S)。后者则是词表（lexicon），或
> 者说是大家常说的输入法的词库，这个词表支持不完全拼音和词到词的转换。由
> 于要支持 big endian 和 small endian 的体系架构，所以干脆就把两种情况下
> 的数据文件都弄上来了。

听起来像是应该与架构无关的数据， big endian/small endian 的转换不是在程
序中进行而是提供不同的数据？感觉应该可以避免。

> 
> 有没有必要把数据文件单独作为一个 package，放到 sunpinyin-data 里面去
> 呢？还是有更好的办法呢？

如果是与架构无关的话，单独提取到 -data 里可以节约仓库的空间。不过目前看
起来并非如此。

> 
> 谢谢啦。
> 
> -- 
> [1] http://blogs.sun.com/yongsun/entry/sunpinyin%E4%BB%A3%E7%A0%81%E5%
> AF%BC%E8%AF%BB_%E4%BA%94
> [2] http://blogs.sun.com/yongsun/entry/sunpinyin%E4%BB%A3%E7%A0%81%E5%
> AF%BC%E8%AF%BB_%E4%B8%83
> 
> 
> -- 
> Regards
> Kov Chai
-- 
Regards,
Deng Xiyue, a.k.a. manphiz

Reply to:

Follow-Ups:
- Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
  - From: "Kov Chai" <chaisave@gmail.com>

References:
- Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
  - From: Kov Chai <tchaikov@gmail.com>
- Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
  - From: ZhengPeng Hou <zhengpeng.hou@gmail.com>
- Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
  - From: "Kov Chai" <tchaikov@gmail.com>

Prev by Date: Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
Next by Date: Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
Previous by thread: Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
Next by thread: Re: Bug#478811: ITP: sunpinyin -- An input method engine based on SLM
Index(es):
- Date
- Thread