[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1017872: RFA: ocrmypdf -- add an OCR text layer to PDF files



Package: wnpp
Severity: normal
X-Debbugs-Cc: debian-python@lists.debian.org, barlow.jim@gmail.com
Control: affects -1 src:ocrmypdf

I request an adopter for the ocrmypdf package.  I don't use it as often
as I did (hardly ever the past couple of years), and anyway it would be
better for a Python programmer to maintain it.

The package description is:
 OCRmyPDF generates a searchable PDF/A file from a regular PDF
 containing only images, allowing it to be searched.
 .
 It uses the Tesseract OCR engine and so supports all the languages
 that Tesseract does.
 .
 Some other main features:
 .
   * Places OCR text accurately below the image to ease copy / paste
   * Keeps the exact resolution of the original embedded images
   * When possible, inserts OCR information as a lossless operation
     without rendering vector information
   * Keeps file size about the same
   * If requested deskews and/or cleans the image before performing OCR
   * Validates input and output files
   * Provides debug mode to enable easy verification of the OCR results
   * Processes pages in parallel when more than one CPU core is
     available
   * Battle-tested on thousands of PDFs, a test suite and continuous
     integration.

-- 
Sean Whitton

Attachment: signature.asc
Description: PGP signature


Reply to: