[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: converting pdf to html?



Forrest English <forrest@truffula.net> writes:

> i've got a book that someone has asked me to convert to html.  it's
> got tons of images, and other things to complicate things...

It would be easiest and least lossy to convert the original format to
HTML.  What is the original format?  (Odds are, you didn't start out
with "emacs book.pdf"....)

If that's not possible, you have a few options.  

* You can convert the pdf to text with pdftotext, rip out the images
  with pdfimages, and write the HTML by hand.

* You could try <http://freshmeat.net/projects/pdftohtml/> which has
  fairly impressive output given its limitations.  

* You could buy Acrobat 5, which claims to be able to do this kind of
  thing, or buy one of the other Acrobat plugins which convert to RTF,
  HTML or somesuch.  

* You could print it out and scan it with OmniWeb or a similar tool.

All of those options suck in various respects... you'll probably find
it easiest going from the original format.

-- 
Alan Shutko <ats@acm.org> - In a variety of flavors!
You're already carrying the sphere!



Reply to: