[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: generate a rss.xml from a bunch of HTML files



On 10/05/2021 07:06, Andrei POPESCU wrote:
> On Lu, 10 mai 21, 01:44:32, Emanuel Berg wrote:
>> Charles Curley wrote:
>>
>>> Right. However, as I found out asking elsewhere, you can
>>> include HTML in Markdown.
>> Hehehe, let's see, first write HTML, then include it in
>> Markdown, then have the static site generator generate
>> HTML... brilliant :)
> Surely there must be some site generator with RSS support that takes 
> "plain" HTML as input.

I would guess that there isn't, purely because the task of figuring out
what information to extract is relatively awkward. OK, there are some
easy tasks such as "What is the title of the page?" (<title> tag), "What
is the publication date of the page?" (mtime of the file), but there are
trickier questions: "Who was the author of this page?" (well, we could
hope for a meta tag, and fall back to the user running the tool,
perhaps) and "What's the copyright of the page?" (I'm fairly certain
there's no standard tag for that in HTML). Finally, there comes to the
tricky bit of the page summary. Most feeds provide a summary of the page
content to entice readers to read the whole article; one or two
paragraphs should be sufficient. But if you've ever used the "Reader
Mode" of a web browser, or ever pointed a screen reader at a web page,
you'll know that finding the body of the page isn't a 100% accurate task.

This is why so many site generators prefer you to provide the pieces and
they'll build up the final HTML. HTML *is* supposed to be a semantic
language rather than a presentation language (that is, one could argue
that the first few <p> tags are the first few paragraphs of the page),
but if you're asking for a tool that can parse arbitrary HTML
(including  machine-generated HTML), then I don't think it's going to be
easy.

>
> Kind regards,
> Andrei

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Reply to: