Re: OCR questions

To: debian-user@lists.debian.org
Subject: Re: OCR questions
From: Rodolfo Medina <rodolfo.medina@gmail.com>
Date: Sun, 22 Jul 2007 11:43:09 +0200
Message-id: <[🔎] 87k5ssrcya.fsf@gmail.com>
References: <877iqe4s98.fsf@gmail.com> <20070608085703.cbbb955a.celejar@gmail.com> <20070609045125.GH4974@localhost.localdomain> <[🔎] 87644dhd4y.fsf_-_@gmail.com> <[🔎] 20070721181027.GA2633@dementia.proulx.com>

Rodolfo Medina wrote:

> I tried gocr and the result was quite miserable.  Then I tried with MS Windows
> and it was almost perfect.  Somewhere in the web I read that OCR software
> under
> Linux is very poor at the moment and that it's better to use MS Windows for
> that: unfortunately my test seems to confirm that.  What do you Debian listers
> think?



bob@proulx.com (Bob Proulx) writes:

> I think you should check out these articles.
>
>   http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html
>
>   http://code.google.com/p/tesseract-ocr/
>
>   http://www.linux.com/articles/57222
>
>   http://sourceforge.net/projects/tesseract-ocr



I tried tesseract, but am sorry to say that, at least with italian language, it
works much better than gocr, but still sensibly worse than MS Windows software
that came with my Canon CanoScan LIDE 25 scanner.  I don't like that, but
unfortunately it is true in my exprience.

I'm reporting the installation procedure, from source:

 1) from: `http://code.google.com/p/tesseract-ocr/downloads/list' I downloaded
    the files tesseract-2.00.tar.gz and tesseract-2.00.ita.tar.gz, put them in
    my ~/tmp directory and unpacked them the usual way: tar xzvf <package-name>;

 2) I copied all the files from the ~/tmp/tessdata into
    ~/tmp/tesseract-2.00/tessdata;

 3) $ cd ~/tmp/tesseract-2.00
    $ ./configure
    $ make
    # make install
    $ tesseract document.tif document -l ita

Bye,
Rodolfo

Reply to:

Follow-Ups:
- Re: OCR questions
  - From: Jörg-Volker Peetz <peetz@scai.fraunhofer.de>

References:
- OCR questions (was: How to acquire text so to edit it?)
  - From: Rodolfo Medina <rodolfo.medina@gmail.com>
- Re: OCR questions (was: How to acquire text so to edit it?)
  - From: bob@proulx.com (Bob Proulx)

Prev by Date: Package monit CHECK_INTERVALS in /etc/default/monit and /etc/init.d/monit
Next by Date: Gardul viu - minune!!!
Previous by thread: Re: OCR questions
Next by thread: Re: OCR questions
Index(es):
- Date
- Thread