[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: generate a rss.xml from a bunch of HTML files



On Mon, 10 May 2021 Emanuel Berg wrote:
...and this somewhat more complex-looking one...

 "W3C RSS 1.0 News Feed Creation How-To"
 https://www.w3.org/2001/10/glance/doc/howto

Great, but stops on <figure> and <figurecaption>,

Elsewhere in the thread you seem to have moved on from XSLT to more
promising options, but I'll make a few comments here anyways.

I suspect that those specific tags are not a primary cause of
difficulty.

I could be mistaken, of course. But I am unable to replicate this
without more information about the input document.

I wonder whether the input document is XML. (No unclosed tags, etc.)

these are HTML5 tags:

  http://html5doctor.com/the-figure-figcaption-elements/

so either we must change the XSLT rules to make use of for
example the caption at least, _or_ we must either make
a rule or tell the tool to ignore them, if such an option
exists...

Here is the Makefile [last] only one problem, the XSLT file or
xsltproc tool (?) doesn't seem to transform the HTML into RSS,
really, output is basically a text file with no markup whatsoever
except for the first line which is

 <?xml version="1.0" encoding="utf-8"?>

TLDR: What you describe will happen when none of an XSLT stylesheet's
template rules match anything in the input.

XSLT is template-based, a little bit like sed or awk, but instead of
processing records (lines) in a text file it processes the nodes of an
XML tree.

When a node matches no template in your stylesheet, then *built-in*
template rules are applied:

  * The built-in template rule for the "document node" (at top of the
  tree), is to apply templates to that node's children.

  * The built-in template rule for any element is the same as for the
    document node -- apply templates to the children of that element.

  * And, when we reach the leaves of the tree, the built-in template
  rule for text nodes is to copy the text to the result tree.

As a consequence, applying an XSLT stylesheet to a document that
matches none of the templates in the stylesheet results in output that
looks identical to the output you would get by applying a trivial
stylesheet containing no template rules at all!

It's a little like how the output of

 $ sed '' somefile

is indistinguishable from

 $ cat somefile

Maybe I do something wrong?

I lack fluency in make/Makefile, and I have not dug into the weeds of
that stylesheet at https://www.w3.org/2001/10/glance/doc/howto .

However, when you call xsltproc it looks to me like you not are
supplying any of the four parameters that the stylesheet html2rss.xsl
expects:

 <xsl:param name = "Base" />
 <xsl:param name = "Channel" />
 <xsl:param name = "xmlfile" />
 <xsl:param name = "xslfile" />
 <xsl:param name = "Page" />

You might supply them like so

 $ xsltproc -o Overview.rss
            --stringparam xmldata "$webpage" \
            --stringparam xlsfile html2rss.xsl \
            --stringparam Base "$(dirname "$webpage")" \
            --stringparam Page "$(dirname "$webpage")" \
            --stringparam Channel Overview.rss \
            html2rss.xsl \
            "$webpage"

name = tree-house

src = ${name}.html

srcpp = ${name}-pp.html

trans = html2rss.xsl

dst = ${name}.rss

opts = --html

all: ${dst}

${srcpp}: ${src}
	sed -e 's/<\/*fig\(ure\|caption\)>//g' $< > $@

${dst}: ${srcpp}
	xsltproc -o $@ ${opts} ${trans} $<

--
Ce qui est important est rarement urgent
et ce qui est urgent est rarement important
-- Dwight David Eisenhower


Reply to: