Re: proofing searchable pdf files

To: debian-user@lists.debian.org
Subject: Re: proofing searchable pdf files
From: Gary Dale <garydale@torfree.net>
Date: Thu, 30 Oct 2014 23:18:50 -0400
Message-id: <[🔎] 5452FF9A.7090902@torfree.net>
Reply-to: gary@extremeground.com
In-reply-to: <[🔎] 5452DC2D.2040502@verizon.net>
References: <[🔎] 5452DC2D.2040502@verizon.net>

On 30/10/14 08:47 PM, Gary Roach wrote:

Hi all,

Problem:
I am working on an archiving project and wish to archive documentsto searchable pdf files but can't seem to figure out how to proof readand correct the text overlay. Any suggestions.
System:
    Debian Wheezy
    Intel i5-750 processor
    HP Officejet Pro 8600 wireless all in one printer/fax/scanner
    gscan2pdf software with Tesseract ocr
    300 to 600 dpi scans.
Tesseract seems to do a really great job but I have no good way ofproving this or correcting any mistakes. Some of the documents are 100years old and may not be in such great shape. I can always retypeeverything but would like to avoid this, as much as possible, forobvious reasons.
Gary R.

Tesseract is the tool for the job. Scan at 600 dpi for best results. Ifthe originals are typed/typeset the results should be good but you mayhave to do some fiddling with the scans to bring out the detail.

The fastest way to proofread is to inhale the text into a word processorand spell check. Grammar checking is also a help.

There are also Tesseract box editors you can try that let you edit theTesseract OCR files.


I thought Tesseract would let you adjust the search words when necessary.

Reply to:

References:
- proofing searchable pdf files
  - From: Gary Roach <gary719_list1@verizon.net>

Prev by Date: Re: proofing searchable pdf files
Next by Date: Re: Perfect Jessie is something like this...
Previous by thread: Re: proofing searchable pdf files
Next by thread: Re: proofing searchable pdf files
Index(es):
- Date
- Thread