Re: convert html to xml
On Sat, Aug 30, 2025 at 03:22:58PM +0700, Max Nikulin wrote:
For me it is not uncommon to get PDF files in search results. That is
why I suspect that something is wrong with your PDF's. Are they
generated to be sent to printer or to be published on a web site? Does
"pdftotext FILE.PDF -" is able to extract readable text? Does "pdfinfo
FILE.PDF" list author, title, etc.? Are links to these files have
descriptive context?
Max,
I am very grateful for your diagnosis. I was unaware of metadata for
PDF.
With a bit of searching, I located several authoritative articles on
metadata for PDF.
It turns out that the hyperref package for LaTeX has provision and
instruction for the metadata fields. And there is a paper by Karl
Rupp, "PDF Metadata in LaTeX Documents".
RLH
Reply to: