Re: Suggestions for tesseract

To: debian-user@lists.debian.org
Subject: Re: Suggestions for tesseract
From: Siard <shiems@mailbox.org>
Date: Thu, 20 Jan 2022 18:26:21 +0100
Message-id: <[🔎] 20220120182621.3c8999b8245f20ea2bafd5be@mailbox.org>
In-reply-to: <[🔎] 1n1sopr-4174-828-r762-3n9145op135@ehcgherq-qhpx.pbz>
References: <[🔎] 1n1sopr-4174-828-r762-3n9145op135@ehcgherq-qhpx.pbz>

Bob Bernstein wrote:
> Executing 'apt-cache search tesseract' brings up a multitude of 
> packages.
> 
> My need is simple enough, I think: I like to scan (using an 
> Epson scanner) pages of printed books -- almost one hundred per 
> cent text -- and then use OCR to produce pages from which I can 
> copy 'n paste snippets of text for note-taking purposes.
> 
> What do the assembled multitudes suggest for a tesseract package 
> (that's the OCR I've been encouraged to use) on my bullseye 
> system, ...

Once you have a PDF containing the images (img2pdf may be used for
that), I think the cleverest way is to use ocrmypdf.
It adds an OCR text layer to the PDF file, so the PDF text becomes
selectable and can be copied.
It uses the Tesseract OCR engine.

$ ocrmypdf -f inputfile.pdf outputfile.pdf

Reply to:

Follow-Ups:
- Re: Suggestions for tesseract
  - From: Curt <curty@free.fr>

References:
- Suggestions for tesseract
  - From: Bob Bernstein <poobah@ruptured-duck.com>

Prev by Date: Re: TDE File Manager options
Next by Date: Re: TDE File Manager options
Previous by thread: Suggestions for tesseract
Next by thread: Re: Suggestions for tesseract
Index(es):
- Date
- Thread