[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Suggestions for tesseract



On 2022-01-20, Siard <shiems@mailbox.org> wrote:
> Bob Bernstein wrote:
>> Executing 'apt-cache search tesseract' brings up a multitude of 
>> packages.
>> 
>> My need is simple enough, I think: I like to scan (using an 
>> Epson scanner) pages of printed books -- almost one hundred per 
>> cent text -- and then use OCR to produce pages from which I can 
>> copy 'n paste snippets of text for note-taking purposes.
>> 
>> What do the assembled multitudes suggest for a tesseract package 
>> (that's the OCR I've been encouraged to use) on my bullseye 
>> system, ...
>
> Once you have a PDF containing the images (img2pdf may be used for
> that), I think the cleverest way is to use ocrmypdf.
> It adds an OCR text layer to the PDF file, so the PDF text becomes
> selectable and can be copied.
> It uses the Tesseract OCR engine.
>
> $ ocrmypdf -f inputfile.pdf outputfile.pdf
>

ocrmypdf has quite a few dependencies on my machine.

The  multitude of packages corresponds more or less to the multiple
languages of the human multitude. I guess the OP's working in English
('tesseract-ocr-eng', pulled in with all the others here when installing
the above).





Reply to: