Re: OCR questions
On 7/21/07, Wayne Topa <firstname.lastname@example.org> wrote:
Nelson Castillo(email@example.com) is reported to have said:
> On 7/21/07, Osamu Aoki <firstname.lastname@example.org> wrote:
> >On Sat, Jul 21, 2007 at 10:53:09PM +0200, Florian Kulzer wrote:
> >> On Sat, Jul 21, 2007 at 22:25:43 +0200, Rodolfo Medina wrote:
> >> Why not use the Debian package? It is called "tesseract-ocr".
> >Yes. But it is old 1.02 version and has FTBFS bug.
> Yes, it's old. I installed from sources but I don't get the charsets.
> tesseract test.tiff out
> Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset
> How do I get them?
1. apt-cache search tesseract-ocr
tesseract-ocr - Command line OCR tool
tesseract-ocr-data - Command line OCR tool data
2. aptitude install tesseract-ocr tesseract-ocr-data
3. less /usr/share/doc/tesseract-ocr/README
This in in testing. YMMV if your running etch.
I run sid. I wanted the latest version. The Debian installation is OK.
But it's old.
Now I just noticed that the language files are not installed by default.
I just found this:
To be completely language independent, there is *no* language
data with the source, so you have to download a separate language
file to get it to work at