[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: convert html to xml



On Sat, Aug 30, 2025 at 03:22:58PM +0700, Max Nikulin wrote:
For me it is not uncommon to get PDF files in search results. That is why I suspect that something is wrong with your PDF's. Are they generated to be sent to printer or to be published on a web site? Does "pdftotext FILE.PDF -" is able to extract readable text? Does "pdfinfo FILE.PDF" list author, title, etc.? Are links to these files have descriptive context?


Max,

I am very grateful for your diagnosis.  I was unaware of metadata for
PDF.

With a bit of searching, I located several authoritative articles on
metadata for PDF.

It turns out that the hyperref package for LaTeX has provision and
instruction for the metadata fields.  And there is a paper by Karl
Rupp, "PDF Metadata in LaTeX Documents".

RLH


Reply to: