[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Desperate for Optical Character Recognition



>Does anyone know of any OCR software available for Linux? Preferably
>something that will plug into gimp with SANE and xscanimage - but anything will
>do, console mode or X - I'm dying never to have to reboot to Windoze again! :).


Me too!  

There is an old project, xocr (located at sunsite under apps/graphics). 
I don't know how good it is, because I haven't be able to get it to
compile.  There are evidently some libraries required I don't have.  If
anyone has gotten xocr compiled let me know, I'd be interested in
finding out how you did it.

There is another old project, OCRchie at 

http://http.cs.berkeley.edu/~fateman/kathey/ocrchie.html
  
that is basically the result of a class project.  I haven't looked at it
yet.

There is the startup of a project at
http://starship.skyport.net/crew/amk/ocr/, but that doesn't help us any.

Another project that looks interesting is SOCR at

 http://www.cs.waikato.ac.nz/~singlis/ocr/

This is a university supported project to create a GPL'd OCR.  It's
still in it's infancy, however.

The only OCR I've found which is reasonably functional is from the
scanshop people (Vividata at www.vividata.com)

Their product is called OCRShop and it's built using Caere's Recognition
Engine.  It does fairly well.  The downside is that it is commercial
software.  Their asking price for it is something like 750.00.  Gulp. 
They do give you a free thirty day trial on it, which I'm currently
using.  But even with the educational discount (25% off) it's still
entirely outside my budget.  Heck, I built my last computer for less.  

It's not a GIMP plugin, of course.  What I do is scan in using SANE +
GIMP and save as a tiff file.  Then I pass the tiff file on to OCRShop. 
Guess I should get back to scanning and rendering because I've only got
three more days left on the trial license...

Oh, I did ask them if they consider giving me a significant discount on
the software (better to earn 100 than nothing, right?) But they replied
that they are under a licensing agreement with Caere which is evidently
pretty pricey for them. 

On the other hand, OCRShop is beta and often, about once per session,
chokes on a tiff file and segfaults.  It also isn't quite as
configurable as one might want--duh, like commercial software ever is...

Once I get done with the prospectus for my dissertation I'm going to
volunteer to help out with one of the two ongoing GPL projects.  If
anyone knows of any other projects let me know...  There really is a
need for a good GPL'd OCR, particularly for academics like myself in the
humanities.  

That's all I've got, Timothy.  Good luck!

--Don


--  
To UNSUBSCRIBE, email to debian-user-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: