Re: Suggestions for tesseract

To: debian-user@lists.debian.org
Subject: Re: Suggestions for tesseract
From: Curt <curty@free.fr>
Date: Thu, 20 Jan 2022 18:44:55 -0000 (UTC)
Message-id: <[🔎] slrnsujbd7.1c3.curty@einstein.electron.org>
References: <[🔎] 1n1sopr-4174-828-r762-3n9145op135@ehcgherq-qhpx.pbz> <[🔎] 20220120182621.3c8999b8245f20ea2bafd5be@mailbox.org>

On 2022-01-20, Siard <shiems@mailbox.org> wrote:
> Bob Bernstein wrote:
>> Executing 'apt-cache search tesseract' brings up a multitude of 
>> packages.
>> 
>> My need is simple enough, I think: I like to scan (using an 
>> Epson scanner) pages of printed books -- almost one hundred per 
>> cent text -- and then use OCR to produce pages from which I can 
>> copy 'n paste snippets of text for note-taking purposes.
>> 
>> What do the assembled multitudes suggest for a tesseract package 
>> (that's the OCR I've been encouraged to use) on my bullseye 
>> system, ...
>
> Once you have a PDF containing the images (img2pdf may be used for
> that), I think the cleverest way is to use ocrmypdf.
> It adds an OCR text layer to the PDF file, so the PDF text becomes
> selectable and can be copied.
> It uses the Tesseract OCR engine.
>
> $ ocrmypdf -f inputfile.pdf outputfile.pdf
>

ocrmypdf has quite a few dependencies on my machine.

The  multitude of packages corresponds more or less to the multiple
languages of the human multitude. I guess the OP's working in English
('tesseract-ocr-eng', pulled in with all the others here when installing
the above).

Reply to:

Follow-Ups:
- Re: Suggestions for tesseract
  - From: Siard <shiems@mailbox.org>

References:
- Suggestions for tesseract
  - From: Bob Bernstein <poobah@ruptured-duck.com>
- Re: Suggestions for tesseract
  - From: Siard <shiems@mailbox.org>

Prev by Date: Re: TDE File Manager options
Next by Date: Re: TDE File Manager options
Previous by thread: Re: Suggestions for tesseract
Next by thread: Re: Suggestions for tesseract
Index(es):
- Date
- Thread