Jonathan Kaye wrote:
Hugo Vanwoerkom wrote:Hi, I've tried now about 5 times to post a thread on an OCR that is opensource a Debian package and works fantastic. But the post does not show up. What's up? HugoHi Hugo, This message showed up. Did you send this last one from the same account as the others? Anyway I'd love to hear about the opensource OCR and I'm sure many others would as well. Can you give us the details?
The OCR is tesseract-ocr. These steps: 1. apt-get install tesseract-ocr 2. apt-get install tesseract-eng 3. use xsane to scan a page at 300 dpi and save as .tif4. but that will be depth 16 which tesseract can't handle so reduce the depth: convert foo.tif -depth 8 foo.x1.tif
5. run tesseract: tesseract foo.x1.tif foo -l eng 6. text will show up as foo.txt.Works faultlessly with me: I have problems with single quotes and dashes but he recognizes all words perfectly.
I have samples too, but let's see if this can be posted. Hugo