[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#721287: RFP: paperwork -- a "scan & forget" tool to make papers searchable



Package: wnpp
Severity: wishlist

* Package name    : paperwork
  Version         : 0.1
  Upstream Author : Jerome Flesch <jflesch@gmail.com>
* URL             : https://github.com/jflesch/paperwork/
* License         : GPL
  Programming Lang: Python
  Description     : a "scan & forget" tool to make papers searchable

Paperwork is a tool to make papers searchable. The basic idea behind Paperwork is "scan & forget" : You should be able to just scan a new document and forget about it until the day you need it again. Let the machine do most of the work.

Papers are organized into documents. Each document contains pages.

It uses mainly 3 other pieces of software:

  * Sane: To scan the pages
  * Cuneiform or Tesseract: To extract the words from the pages (OCR)
  * GTK/Glade: For the user interface

Page orientation is automatically guessed using OCR.

Paperwork uses a custom indexation system to search documents and to provide keyword suggestions. Since OCR is not perfect, and since some documents don't contain useful keywords, Paperwork allows also to put labels on each document.


Reply to: