[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: converting pdf to html?



on Wed, Apr 18, 2001 at 04:17:40PM -0400, Alan Shutko (ats@acm.org) wrote:
> Forrest English <forrest@truffula.net> writes:
> 
> > i've got a book that someone has asked me to convert to html.  it's
> > got tons of images, and other things to complicate things...
> 
> It would be easiest and least lossy to convert the original format to
> HTML.  What is the original format?  (Odds are, you didn't start out
> with "emacs book.pdf"....)
> 
> If that's not possible, you have a few options.  
> 
> * You can convert the pdf to text with pdftotext, rip out the images
>   with pdfimages, and write the HTML by hand.
> 
> * You could try <http://freshmeat.net/projects/pdftohtml/> which has
>   fairly impressive output given its limitations.  
> 
> * You could buy Acrobat 5, which claims to be able to do this kind of
>   thing, or buy one of the other Acrobat plugins which convert to RTF,
>   HTML or somesuch.  
> 
> * You could print it out and scan it with OmniWeb or a similar tool.
> 
> All of those options suck in various respects... you'll probably find
> it easiest going from the original format.

Agreeing with most of this, the task of converting from one display
format to another begs the question:  what is the work and what is the
copyright status of it?

My preferred conversion tool is the author.  Most originating formats,
regardless of source, can be trivially converted to PDF, PS, and HTML
these days. 

Quick scan of Debian utilities doesn't suggest anything.  A Google
search does:

  http://www.google.com/search?q=pdf+to+html+conversion&btnG=Google+Search

Cheers.

-- 
Karsten M. Self <kmself@ix.netcom.com>    http://kmself.home.netcom.com/
 What part of "Gestalt" don't you understand?       There is no K5 cabal
  http://gestalt-system.sourceforge.net/         http://www.kuro5hin.org

Attachment: pgpcx2R6LG9RP.pgp
Description: PGP signature


Reply to: