Re: OCR questions
Rodolfo Medina wrote:
> I tried gocr and the result was quite miserable. Then I tried with MS Windows
> and it was almost perfect. Somewhere in the web I read that OCR software
> under
> Linux is very poor at the moment and that it's better to use MS Windows for
> that: unfortunately my test seems to confirm that. What do you Debian listers
> think?
bob@proulx.com (Bob Proulx) writes:
> I think you should check out these articles.
>
> http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html
>
> http://code.google.com/p/tesseract-ocr/
>
> http://www.linux.com/articles/57222
>
> http://sourceforge.net/projects/tesseract-ocr
I tried tesseract, but am sorry to say that, at least with italian language, it
works much better than gocr, but still sensibly worse than MS Windows software
that came with my Canon CanoScan LIDE 25 scanner. I don't like that, but
unfortunately it is true in my exprience.
I'm reporting the installation procedure, from source:
1) from: `http://code.google.com/p/tesseract-ocr/downloads/list' I downloaded
the files tesseract-2.00.tar.gz and tesseract-2.00.ita.tar.gz, put them in
my ~/tmp directory and unpacked them the usual way: tar xzvf <package-name>;
2) I copied all the files from the ~/tmp/tessdata into
~/tmp/tesseract-2.00/tessdata;
3) $ cd ~/tmp/tesseract-2.00
$ ./configure
$ make
# make install
$ tesseract document.tif document -l ita
Bye,
Rodolfo
Reply to: