[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#242020: www.debian.org: security/dsa-long.en.rdf has HTML markup in <description> tag



Gerfried Fuchs <alfie@ist.org> writes:

> * Mario Lang <mlang@debian.org> [2004-04-04 13:19]:
>> I just added http://www.debian.org/security/dsa-long.en.rdf to my RSS
>> feed aggregator and realized that it does behave strangely.
>
>  It might be your feed aggregator -- it works perfectly on
> planet.debian.net.

No, what is actually happening is that my aggregator (Gnus)
does not parse HTML inside <description> tags, which is a reasonable
thing to do, since documents like
<URLhttp://www.mnot.net/rss/tutorial/>

advice content publishers to refrain from using them:

"    * Encoding HTML - Although it's tempting, refrain from including
       HTML markup (like <a href="...">, <b> or <p>) in your RSS feed;
       because you don't know how it will be presented, doing so can
       prevent your feed from being displayed correctly."

I am assuming it is working on planet.debian.net simply because the
escaped HTML (&lt;a href="..."&gt;) is parsed by browsers for good measures,
however, I'd say that this is not valid HTML anyways.

>> AIUI, the description tag is not supposed to contain ordinary HTML markup
>> in RSS 1.0.
>
>  Thats why they are escaped and put in there as entities.

But then, you are simply hoping for something to interpret this mess.
If an aggregator does not, the resulting description text does simply
look ugly and is hard to read.

>> Since it is ment as a teaser anyway, and interested people are
>> supposed to follow the link (thats the rss design), I think it would
>> not hurt to be more standards compliant and simply strip well-known
>> HTML constructs.
>
>  No, please not. From what I understand it HTML is allowed in there if
> it is encoded as entities.

I continue quoting from the same page:

"      If you need to include a a tag in the text of the feed (e.g.,
       the title of an item is "Ode to <title>"), make sure you escape
       ampersands and angle brackets (so that it would be "Ode to
       &lt;title&gt;")."

However, this is not saying "Use ordinary html markup to identify links
and paragraphs".

The problem is that some aggregators might be able to parse escaped HTML
markup, but it is simply not specified in the RSS standard, and so, aggregators
are not required too.

> But not in this way, sorry. Especially because
>
>> Index: english/template/debian/recent_list.wml
>> ===================================================================
>>                  $moreinfo =~ s/</&lt;/g;
>>                  $moreinfo =~ s/>/&gt;/g;
>>                  $moreinfo =~ s/"/&quot;/g;
>
>  you leave this escaping in it still.

Of course I do, because of above mentioned reasons.  It should continue
to escape < and > and the-like, however, we should strip out anchor
and paragraph start/end tags.

>  Your patch is quite dirty in that respect. Either strip them off
> completely or do encode them correctly....

I disagree.

-- 
CYa,
  Mario | Debian Developer <URL:http://debian.org/>
        | Get my public key via finger mlang@db.debian.org
        | 1024D/7FC1A0854909BCCDBE6C102DDFFC022A6B113E44

Attachment: pgpit5hJQxvVL.pgp
Description: PGP signature


Reply to: