[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to convert html to PDF?

Quoting Jonas Smedegaard (2017-01-30 18:28:11)
> Quoting Shrinivasan T (2017-01-30 17:52:29)
>> I am looking for a solution to convert HTML to PDF with custom Tamil 
>> language fonts and custom paper size.
> You might also try pandoc (unlikely to work without fine-tuning, but 
> if it works then you can do powerful things like scraping a web page 
> and apply a LaTeX template to produce professional-grade output - like 
> I did with http://source.jones.dk/eut.git/ to produce 
> http://eut.biks.dk/ - an 60+ pages research study edited on a wiki and 
> finalized as PDF books optimized for print and "ebook-style" use.

This - using XeLaTeX and XeTeX internally - seems to work:

  pandoc --standalone --latex-engine xelatex -V lang='' -V papersize=a5 -V mainfont="Uni Ila.Sundaram-10" -V margin-left=5mm -V margin-right=5mm -V margin-top=10mm -V marginbottom=15mm --output manaosai-xetex-ebook.pdf https://ia800203.us.archive.org/31/items/ManaosaiShortStories/Manaosai-short-stories.html

Result contains "tofu" - white blocks indicating characters unsupported 
by the font - which I suspect your choice of font is to blame for.

If my guess is correct, then obivously the best is if you can get the 
font corrected (looks like it is a free font), or pick another font. 
Alternatively a filter can be applied to pandoc to replace upsetting 
characters with something supported by the font.  Or if that's an 
option, simply tell your authors to avoid upsetting characters.

I did not succeed setting papersize to B6, but that should be possible 
by tuning the LaTeX template used.

This - using wkhtmltopdf and QtWebkit internally - works too:

pandoc --standalone -t html5 -V papersize=B6 --output manaosai-webkit-ebook.pdf https://ia800203.us.archive.org/31/items/ManaosaiShortStories/Manaosai-short-stories.html

...but it failed when I applied margins (same syntax as the XeTeX 
renderer) and I didn't attempt switch font (different syntax: option 
--css pointing to a CSS file).  Also, that renderer has no other 
options, so I would prefer the TeX approach myself.

A third pandoc-based approach - using ConTexT internally - exists too, 
which I didn't explore (but I found a bug in the pandoc package: It is 
missing a suggestion on "context" package).

 - Jonas

 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

Attachment: signature.asc
Description: signature

Reply to: