Re: Suggestions for tesseract
On 2022-01-20, Siard <shiems@mailbox.org> wrote:
> Bob Bernstein wrote:
>> Executing 'apt-cache search tesseract' brings up a multitude of
>> packages.
>>
>> My need is simple enough, I think: I like to scan (using an
>> Epson scanner) pages of printed books -- almost one hundred per
>> cent text -- and then use OCR to produce pages from which I can
>> copy 'n paste snippets of text for note-taking purposes.
>>
>> What do the assembled multitudes suggest for a tesseract package
>> (that's the OCR I've been encouraged to use) on my bullseye
>> system, ...
>
> Once you have a PDF containing the images (img2pdf may be used for
> that), I think the cleverest way is to use ocrmypdf.
> It adds an OCR text layer to the PDF file, so the PDF text becomes
> selectable and can be copied.
> It uses the Tesseract OCR engine.
>
> $ ocrmypdf -f inputfile.pdf outputfile.pdf
>
ocrmypdf has quite a few dependencies on my machine.
The multitude of packages corresponds more or less to the multiple
languages of the human multitude. I guess the OP's working in English
('tesseract-ocr-eng', pulled in with all the others here when installing
the above).
Reply to: