[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

is statistical data extracted from web DFSG compliant?

Hi everyone,

i am working on a Chinese input method engine [1,2]. This input method
engine is based on the statistical language model. We extract the
language model from a 150MiB corpus collected from some choosed
Chinese websites using a training algorithm. The extracted data -- the
language model -- does not contain any text from these websites. only
statistics for the frequencies of occurrence of given character
sequences are stored in a binary format. The size of resulting
statistical data is 24MiB (in binary format). And these data files are
built into separated binary packages.

The upstream released both the data and the source code for
reading/writing such kind of binary file under dual license of CDDL
and LGPLv2.1 . So I believe this software (and data) is free. But I am
afraid that this package is not compatible to DFSG#2 in some sense. Am
I right?

Is there anyway to fix this? Is removing the non-free piece the only
way? I've put the explanation in README.Debian of this package.

Thanks in advance.


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478811
[2] http://mentors.debian.net/cgi-bin/sponsor-pkglist?action=details;package=sunpinyin-slm

Kov Chai

Reply to: