[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OCR questions (was: How to acquire text so to edit it?)



On Sat, Jul 21, 2007 at 08:10:27PM +0200, Bob Proulx wrote:
> Rodolfo Medina wrote:
> > Somewhere in the web I read that OCR software under Linux is very
> > poor at the moment and that it's better to use MS Windows for that:
> > unfortunately my test seems to confirm that.  What do you Debian
> > listers think?
> 
> I think you should check out these articles.
> 
>   http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html
> 
>   http://code.google.com/p/tesseract-ocr/
> 
>   http://www.linux.com/articles/57222

hey, looks pretty good. The linux.com article complains about having
to manually crop out photos and the limited file formats accepts (tiff
only) but those are pretty minor. Its should be fairly simple to put
wrappers around to clean up the and convert files format to get data
into the thing without having to grok OCR code. IOW, I would expect to
see this get used as a backend in various other existing graphics code
bases to make OCR really viable in OSS.

A

Attachment: signature.asc
Description: Digital signature


Reply to: