[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OCR



On Thu, Apr 28, 2016 at 01:15:40PM +0200, Sebastian Humenda wrote:
> Hi,
> 
> MENGUAL Jean-Philippe schrieb am 27.04.2016, 20:01 +0200:
> >After test of various OCR, I feel that Tesseract, the most advanced OCR
> >engine on Linux, hasn't noawadays all ways to be as performant as
> >commercial utilities. Even if it's wrapped in some tools like Lios
> >or gimagereader, the performance is still difficult to use for "basic"
> >users (I mean, the Windows users who don't have any technical knowledge
> >or who use computer just for needs).
> I expect you don't mean performance as a technical term? Regarding technically,
> I think Tesseract performs all right.
I do think performance was meant as "recognition accuracy". Altough i
rather advertise tesseract and free solutions, we can only recognize
that proprietary tools such as finereader have better recognition rates,
structure analysis, and format conversion (as for .odt)

> Well if it's a really recent version of Finereader, I'd be interested.
it's precisely finereader v11.1.9.622165

> >- A package to run it on MATE. 2 ways:
> >* from an image file, right-click, choose the proper option
> >* from a scanner: we give a command to create a binding (as ours in
> >linked against Compiz).
> Recognizing images is probably the most interesting feature for me, Tesseract
> works (mostly) fine for my daily mail. However it should also recognize PDF's,
> which Finereader does anyway.
> 
> Please let me know when there's development in this direction.
there are three ways of using this OCR package:
  - it is integrated to the caja file manager in mate (maybe later in nautilus if interest is there), so one can right click -> launch OCR on files
  - one can bind a shortcut to your favorite window manager, and launch and scan + OCR and get the results in a libreoffice document.
  - users can have access to more flexible options by cmd line
these features are available without difference with tesseract, or
finereader. So one can use tesseract transparently, including with PDFs,
but one will get better recognition with the proprietary engine.

happy hacking
-- 
Ksamak
hypra.fr Team

Attachment: pgp8yJ47M9RJH.pgp
Description: PGP signature


Reply to: