[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#403619: languagetool -- rule-based language checker



This is an overview of LanguageTool's runtime dependencies.

Lib exists in Debian in the version LT needs it:

  libcommons-lang-java 2.4
  libcommons-logging-java 1.1.1
  libcommons-validator-java 1.3.1

Lib exists but is not up-to-date (I checked 'unstable'):

  libsegment-java 1.3.5, LT needs 1.3.0 and LT 1.8 will need 1.3.8
  libjwordsplitter-java 3.0, LT needs 3.3
  libmorfologik-stemming-java 1.2.2, LT needs morfologik-fsa-1.5.2 and
     morfologik-stemming-1.5.2 (the lib has been split up)

Libs that I did not find in Debian and that we require:

  tika-core-0.9.jar from http://tika.apache.org/, Apache License 2.0

Libs that I did not find in Debian but that are only required for Chinese so 
I think we could do without for now:

  ictclas4j-1.0.jar from http://code.google.com/p/ictclas4j/, 
    Apache License 2.0
  CJFtransform_v1.0.1_bin.jar from http://code.google.com/p/cjftransform/,
     Apache License 2.0 

The internal dictionaries we use are huge when saved as text files (e.g. 
200MB for German alone). Thus we compress them as a finite-state automaton 
with the morfologik-stemming project, which yields a 10 time better 
compression than bzip2 (tested with the German dictionary). We 
describe how to dump the dictionaries to plain text at the URL that Marcin 
has posted.

The question is, what can we do now to help the process of getting LT into 
Debian?

Regards
 Daniel

-- 
http://www.danielnaber.de



Reply to: