This works reasonably well: curl "https://www.debian.org/social_contract.en.html" |\ sed '/^<div id="content">/,/^<\/div> <!-- end content -->/{//!b};d' |\ pandoc -f html -t latex -V geometry:a4paper,margin=2cm -o social_contract.pdf