Re: post doesn't show up

To: debian-user@lists.debian.org
Subject: Re: post doesn't show up
From: Anthony Campbell <ac@acampbell.org.uk>
Date: Fri, 26 Dec 2008 12:55:47 +0000
Message-id: <[🔎] 20081226125547.GD4682@acampbell.org.uk>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] gj02po$2f0$1@ger.gmane.org>
References: <[🔎] giuapr$ves$1@ger.gmane.org> <[🔎] givdi2$qj1$1@ger.gmane.org> <[🔎] gj01tn$13t$1@ger.gmane.org> <[🔎] gj02po$2f0$1@ger.gmane.org>

On 25 Dec 2008, Hugo Vanwoerkom wrote:

[snip] 

>> The OCR is tesseract-ocr. These steps:
>>
>> 1. apt-get install tesseract-ocr
>> 2. apt-get install tesseract-eng
>> 3. use xsane to scan a page at 300 dpi and save as .tif
>> 4. but that will be depth 16 which tesseract can't handle so reduce the 
>> depth: convert foo.tif -depth 8 foo.x1.tif
>> 5. run tesseract: tesseract foo.x1.tif foo -l eng
>> 6. text will show up as foo.txt.
>>
>> Works faultlessly with me: I have problems with single quotes and 
>> dashes but he recognizes all words perfectly.
>>
[snip] 

I agree that tesseract does work remarkably well. However, I omit the
'convert' step because for me this gives an error:

"convert: Caution: quantization tables are too coarse for baseline JPEG.`JPEGLib'."

However, it seems to be unnecessary here. For me, xsane gives a 24-depth
image (not 16-depth) and tesseract seems to be happy with this. I also
omit "-l eng" since I didn't include any other languages when I
installed tesseract. As suggested in the documentation, I put 

'export TESSDATA_PREFIX="/usr/share/tesseract-ocr/" 

in .bashrc (note the final /). 

To make things work now I just do "tesseract foo.tif foo".

I'm impressed. I mentioned ocrad a few posts ago here; that works too,
but there are more errors than with tesseract.

Anthony

-- 
Anthony Campbell - ac@acampbell.org.uk 
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews, 
and sceptical articles)

Reply to:

References:
- post doesn't show up
  - From: Hugo Vanwoerkom <hvw59601@care2.com>
- Re: post doesn't show up
  - From: Jonathan Kaye <jdkaye10@gmail.com>
- Re: post doesn't show up
  - From: Hugo Vanwoerkom <hvw59601@care2.com>
- Re: post doesn't show up
  - From: Hugo Vanwoerkom <hvw59601@care2.com>

Prev by Date: Re: Java on Debian
Next by Date: bootcd - is this deprecated?
Previous by thread: Re: post doesn't show up
Next by thread: get list of config files (with aptitude) tha were manually changed
Index(es):
- Date
- Thread