[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: pdftohtml



on Wed, Dec 01, 2004 at 01:48:33AM +0100, Gerard Robin (jag.robin18@wanadoo.fr) wrote:
> Hello,
> 
> I have a few problems with pdftohtml (unstable) :
> 
> with one pdf file I get a suitable html file but with another one I get an unreadable html file.
> 
> I tried "pdftohtml -c -l 1 file.pdf"  but the output is always unreadable and I get the message:  
> 
> free(): invalid pointer 0x80f02e0!
> Page-1
> 
> 
> However xpdf (or gv) displays correctly this file.pdf.
> 
> I guess that the problem comes out of the feature of this pdf file and
> I would like to know if it 

Note first that 'PDF' isn't a simple file format.  Some PDFs are little
more than marked-up text, others are essentially large image files
(scanned in faxes from lawyers, such as are posted to Groklaw, are
infamous for this).

There are also a few different versions of the PDF and PS formats.


If you can post or point to the file you're trying to convert, this
could be helpful.  Knowing how that file was created and with what
tools, ditto.

'ps2ps' on a Postscript file sometimes works around bugs that stymie
some viewers (or printers).  It's a roundabout way, but:

   pdf2ps file.pdf file.ps
   ps2ps file.ps file-new.ps
   ps2pdf file-new.ps file-new.pdf
   pdftohtml file-new.pdf file-new.html

...might get you somewhere.  Most likely, a really broken hash of a
file.


Alternatively, if the source of the PDF file is available, converting
*it* to HTML directly should provide far superior results.
 


Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
   Geek for hire:  http://kmself.home.netcom.com/resume.html

Attachment: signature.asc
Description: Digital signature


Reply to: