Re: tesseract: ocr that works
On Dec 28, 5:10 am, Anthony Campbell <a...@acampbell.org.uk> wrote:
> On 21 Dec 2008, Hugo Vanwoerkom wrote:
> Yes, tesseract does work well. Here, xsane gives depth 24, but conversion
> to depth 8 is neither possible nor necessary. Following the docs, I did
There is an option at the top of the Preferences/Filetyple tab to save
in 8-bit, but glad to know this isn't needed.
> export TESSDATA_PREFIX="/usr/share/tesseract-ocr/"
> There was no need for "- l eng" since I only had the English version of
> tesseract installed. So to scan a page saved at 300 dpi I just do:
> tesseract foo.dvi foo
> The result is excellent. I got pretty good results with ocrad but
> tesseract is definitely better.
I got poor results on a plain text sample, and much better using gocr
with the same scan saved by xsane in pnm format. I see your input
file is a DVI. Is that format yield better results than TIFF? If so,
how did you convert to that from the formats that xsane will save to?
Took me a while to figure out that tesseract will not read a TIFF if
its file extension is 'tiff' instead of 'tif'. Hadn't quite noticed
that in the previous poster's instructions.