[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Suggestions for tesseract



Bob Bernstein wrote:
> Executing 'apt-cache search tesseract' brings up a multitude of 
> packages.
> 
> My need is simple enough, I think: I like to scan (using an 
> Epson scanner) pages of printed books -- almost one hundred per 
> cent text -- and then use OCR to produce pages from which I can 
> copy 'n paste snippets of text for note-taking purposes.
> 
> What do the assembled multitudes suggest for a tesseract package 
> (that's the OCR I've been encouraged to use) on my bullseye 
> system, ...

Once you have a PDF containing the images (img2pdf may be used for
that), I think the cleverest way is to use ocrmypdf.
It adds an OCR text layer to the PDF file, so the PDF text becomes
selectable and can be copied.
It uses the Tesseract OCR engine.

$ ocrmypdf -f inputfile.pdf outputfile.pdf


Reply to: