[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OCR questions

On 7/21/07, Wayne Topa <linuxone@intergate.com> wrote:
Nelson Castillo(nelsoneci@gmail.com) is reported to have said:
> On 7/21/07, Osamu Aoki <osamu@debian.org> wrote:
> >On Sat, Jul 21, 2007 at 10:53:09PM +0200, Florian Kulzer wrote:
> >> On Sat, Jul 21, 2007 at 22:25:43 +0200, Rodolfo Medina wrote:
> >> Why not use the Debian package? It is called "tesseract-ocr".
> >
> >Yes.  But it is old 1.02 version and has FTBFS bug.
> Yes, it's old. I installed from sources but I don't get the charsets.
> tesseract test.tiff out
> Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset
> How do I get them?

1.  apt-cache search tesseract-ocr
tesseract-ocr - Command line OCR tool
tesseract-ocr-data - Command line OCR tool data

2.  aptitude install tesseract-ocr tesseract-ocr-data

3.  less /usr/share/doc/tesseract-ocr/README

This in in testing.  YMMV if your running etch.


I run sid. I wanted the latest version. The Debian installation is OK.
But it's old.
Now I just noticed that the language files are not installed by default.

I just found this:

 To be completely language independent, there is *no* language
 data with the source, so you have to download a separate language
 file to get it to work at




Reply to: