[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#167743: RFP: prescript -- Utility for extracting text from PostScript files



Package: wnpp
Version: N/A; reported 2002-11-04
Severity: wishlist

* Package name    : prescript
  Version         : 2.2
  Upstream Author : New Zealand Digital Library administrator <nzdl@cs.waikato.ac.nz>
* URL             : http://www.nzdl.org/html/prescript.html
* License         : GPL
  Description     : Utility for extracting text from PostScript files

PostScript conversion to plain ASCII or HTML.
  PreScript is really a PostScript to plain text converter, but rudimentary
  HTML can also be produced. Tags are inserted to mark paragraphs (<p>),
  short lines (<br>), page breaks (<hr>), and header and footers (italicized
  with <i>...</i>).
Paragraph boundaries detection.
  PreScript determines the line spacing of a document and uses this (and also
  indentations) to determine paragraph boundaries.
Hyphenation removal.
  Hyphenated words are de-hyphenated.
Ligature translation.
  Most ligatures used by TeX document are detected. PreScript doesn't track
  font changes making it impossible to reliably detect all ligatures. 

Remark: PreScript is a prerequisite for the package zope-documentlib
        I ITPed in bug #167694.  If noone steps in here I have to grab
        the thing myself but I would love if someone other would like
        to care for this nice piece of software.

-- System Information
Debian Release: 3.0
Architecture: i386
Kernel: Linux wr-linux02 2.4.17 #1 Mit Jan 23 14:00:21 CET 2002 i686
Locale: LANG=de_DE@euro, LC_CTYPE=de_DE@euro




Reply to: