Bug#167743: RFP: prescript -- Utility for extracting text from PostScript files
Package: wnpp
Version: N/A; reported 2002-11-04
Severity: wishlist
* Package name : prescript
Version : 2.2
Upstream Author : New Zealand Digital Library administrator <nzdl@cs.waikato.ac.nz>
* URL : http://www.nzdl.org/html/prescript.html
* License : GPL
Description : Utility for extracting text from PostScript files
PostScript conversion to plain ASCII or HTML.
PreScript is really a PostScript to plain text converter, but rudimentary
HTML can also be produced. Tags are inserted to mark paragraphs (<p>),
short lines (<br>), page breaks (<hr>), and header and footers (italicized
with <i>...</i>).
Paragraph boundaries detection.
PreScript determines the line spacing of a document and uses this (and also
indentations) to determine paragraph boundaries.
Hyphenation removal.
Hyphenated words are de-hyphenated.
Ligature translation.
Most ligatures used by TeX document are detected. PreScript doesn't track
font changes making it impossible to reliably detect all ligatures.
Remark: PreScript is a prerequisite for the package zope-documentlib
I ITPed in bug #167694. If noone steps in here I have to grab
the thing myself but I would love if someone other would like
to care for this nice piece of software.
-- System Information
Debian Release: 3.0
Architecture: i386
Kernel: Linux wr-linux02 2.4.17 #1 Mit Jan 23 14:00:21 CET 2002 i686
Locale: LANG=de_DE@euro, LC_CTYPE=de_DE@euro
Reply to: