Re: edit pdf's

To: debian-user@lists.debian.org
Subject: Re: edit pdf's
From: dircha <dircha@dircha.com>
Date: Tue, 11 May 2004 23:06:07 -0500
Message-id: <[🔎] 40A1A2AF.6080205@dircha.com>
In-reply-to: <[🔎] 20040512030447.GA28437@debian.potter>
References: <1UnRy-QR-189@gated-at.bofh.it> <1UnRy-QR-187@gated-at.bofh.it> <1UEIs-2AI-35@gated-at.bofh.it> <[🔎] 40A0F183.1030705@rcn.com> <[🔎] 20040511170116.GB12295@utoronto.ca> <[🔎] 20040512030447.GA28437@debian.potter>

Kevin Mark wrote:

On Tue, May 11, 2004 at 01:01:16PM -0400, Matt Price wrote:

thanks for the flues folks.  pdftohtml -- which I confess I *did*
already know about, sorry, should havesaid so -- won't work so well
for me, i odn't think;  these are scanned-in texts from the jstor

journal collection, and it's important I keep the pages in order...

as ,er, someone mentioned earlier (don't have the thread in front of
me at the moment), a complex process involving gimp and pdftops seems
to be the best bet, but it's insanely labour-intensive for long

documents, so I may forego the whole project. thx all though.


you mentioned something that caught my eye as it relates to a need in
FOSS that a friend of mine is looking for. A replacement for the
PAPERPORT product that allows for scanning in multipage docs, with the
ability to annotate pages, store ocr data with pages and to search the
archive as well as have a 'desktop environment app' that can show the
virtual folders of document with document thumbnails. PAPERPORT uses pdf
as their new format. Has anyone considered making such an apps? There
are many lawyer offices that would like this as well as people who deal
with large collections of document repositories.


I don't seem to have the root of this thread any longer.

However, have you looked into using pdfimages to extract the images andthen gocr to extract the text from the images? You might want netpbm tooif you go that route.


dircha

Reply to:

References:
- Re: edit pdf's
  - From: Ralph Katz <ralph.katz@rcn.com>
- Re: edit pdf's
  - From: Matt Price <matt@derailleur.org>
- Re: edit pdf's
  - From: Kevin Mark <kmark+debian-user@pipeline.com>

Prev by Date: Re: edit pdf's
Next by Date: Re: UnixODBC/MDBTools on Debian Testing
Previous by thread: Re: edit pdf's
Next by thread: Recommendation on Digital Cameras that work well with linux....and...
Index(es):
- Date
- Thread