Re: OCR questions (was: How to acquire text so to edit it?)

To: debian-user@lists.debian.org
Subject: Re: OCR questions (was: How to acquire text so to edit it?)
From: Andrew Sackville-West <andrew@farwestbilliards.com>
Date: Sat, 21 Jul 2007 11:57:06 -0700
Message-id: <[🔎] 20070721185706.GM4323@localhost.localdomain>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 20070721181027.GA2633@dementia.proulx.com>
References: <877iqe4s98.fsf@gmail.com> <20070608085703.cbbb955a.celejar@gmail.com> <20070609045125.GH4974@localhost.localdomain> <[🔎] 87644dhd4y.fsf_-_@gmail.com> <[🔎] 20070721181027.GA2633@dementia.proulx.com>

On Sat, Jul 21, 2007 at 08:10:27PM +0200, Bob Proulx wrote:
> Rodolfo Medina wrote:
> > Somewhere in the web I read that OCR software under Linux is very
> > poor at the moment and that it's better to use MS Windows for that:
> > unfortunately my test seems to confirm that.  What do you Debian
> > listers think?
> 
> I think you should check out these articles.
> 
>   http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html
> 
>   http://code.google.com/p/tesseract-ocr/
> 
>   http://www.linux.com/articles/57222

hey, looks pretty good. The linux.com article complains about having
to manually crop out photos and the limited file formats accepts (tiff
only) but those are pretty minor. Its should be fairly simple to put
wrappers around to clean up the and convert files format to get data
into the thing without having to grok OCR code. IOW, I would expect to
see this get used as a backend in various other existing graphics code
bases to make OCR really viable in OSS.

A

Attachment: signature.asc
Description: Digital signature

Reply to:

References:
- OCR questions (was: How to acquire text so to edit it?)
  - From: Rodolfo Medina <rodolfo.medina@gmail.com>
- Re: OCR questions (was: How to acquire text so to edit it?)
  - From: bob@proulx.com (Bob Proulx)

Prev by Date: Re: ppm type 6 to ppm type 3: debian logo's
Next by Date: Re: printing troubles
Previous by thread: Re: OCR questions (was: How to acquire text so to edit it?)
Next by thread: Re: OCR questions
Index(es):
- Date
- Thread