Quoting Jonas Smedegaard (2017-01-30 18:28:11) > Quoting Shrinivasan T (2017-01-30 17:52:29) >> I am looking for a solution to convert HTML to PDF with custom Tamil >> language fonts and custom paper size. [...] > You might also try pandoc (unlikely to work without fine-tuning, but > if it works then you can do powerful things like scraping a web page > and apply a LaTeX template to produce professional-grade output - like > I did with http://source.jones.dk/eut.git/ to produce > http://eut.biks.dk/ - an 60+ pages research study edited on a wiki and > finalized as PDF books optimized for print and "ebook-style" use. This - using XeLaTeX and XeTeX internally - seems to work: pandoc --standalone --latex-engine xelatex -V lang='' -V papersize=a5 -V mainfont="Uni Ila.Sundaram-10" -V margin-left=5mm -V margin-right=5mm -V margin-top=10mm -V marginbottom=15mm --output manaosai-xetex-ebook.pdf https://ia800203.us.archive.org/31/items/ManaosaiShortStories/Manaosai-short-stories.html Result contains "tofu" - white blocks indicating characters unsupported by the font - which I suspect your choice of font is to blame for. If my guess is correct, then obivously the best is if you can get the font corrected (looks like it is a free font), or pick another font. Alternatively a filter can be applied to pandoc to replace upsetting characters with something supported by the font. Or if that's an option, simply tell your authors to avoid upsetting characters. I did not succeed setting papersize to B6, but that should be possible by tuning the LaTeX template used. This - using wkhtmltopdf and QtWebkit internally - works too: pandoc --standalone -t html5 -V papersize=B6 --output manaosai-webkit-ebook.pdf https://ia800203.us.archive.org/31/items/ManaosaiShortStories/Manaosai-short-stories.html ...but it failed when I applied margins (same syntax as the XeTeX renderer) and I didn't attempt switch font (different syntax: option --css pointing to a CSS file). Also, that renderer has no other options, so I would prefer the TeX approach myself. A third pandoc-based approach - using ConTexT internally - exists too, which I didn't explore (but I found a bug in the pandoc package: It is missing a suggestion on "context" package). - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private
Attachment:
signature.asc
Description: signature