[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: convert html to xml



On 01/09/2025 11:03, Russell L. Harris wrote:

I do not see in dvipdfm a way to add metadata.

It seems, you have found a recipe how to add metadata. In addition, the hyperref manual may be used as a reference.

I am unaware if XMP PDF/A metadata may give additional benefits related to rank assigned by search engines.

I am not familiar with dvipdfm. From what I have heard, LuaLaTeX is the currently recommended engine. Developers of some packages may have little motivation to support other engines.

Another point is to ensure that you use scalable (vector) fonts (try x800 magnification in a viewer, pdffonts).

Making PDF files convenient for users and search engines is more complex than just adding a couple of options to your favorite TeX engine.

On Mon, Sep 01, 2025 at 10:09:12AM +0700, Max Nikulin wrote:
I do not believe in magic, I expect that other CMS and static site generators may be configured to achieve results similar to WordPress.
[...]
I really have no wish to climb into bed with WordPress.  Tonight I
dump my WordPress documents and notes into the dumpster.

My point is that in a community around another project you may meet a person who will get you questions as you can ask them. Maybe, reading docs for other tools, you will realize what you really need.

As to XML, e.g. OpenOffice/LibreOffice .odf files are XML based, but it does not mean that you may inject them into WordPress (perhaps, there is a plugin for this specific task or copy-paste is enough in most cases). On the other hand, I would not be surprised if backup/restore XML files generated by blogger.com use the same schema as some WordPress component or plugin.

On 02/09/2025 02:36, Roy J. Tellason, Sr. wrote:
On Sunday 31 August 2025 11:09:12 pm Max Nikulin wrote:
I have no idea if search engines parse links in PDFs.

They apparently do.  A while back I was using some of the tools offered
by google on my website,  and got errors reported that were caused by
links that were inside of pdf files,  until I figured out how to tell
those tools to not look inside of pdf files.

Thanks for the data point. There are still some other "real" search engines.


Reply to: