[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#242020: www.debian.org: security/dsa-long.en.rdf has HTML markup in <description> tag



Gerfried Fuchs <alfie@ist.org> writes:

> * Mario Lang <mlang@debian.org> [2004-04-04 16:07]:
>> Gerfried Fuchs <alfie@ist.org> writes:
>>> * Mario Lang <mlang@debian.org> [2004-04-04 13:19]:
>>>> AIUI, the description tag is not supposed to contain ordinary HTML markup
>>>> in RSS 1.0.
>>>
>>>  Thats why they are escaped and put in there as entities.
>> 
>> But then, you are simply hoping for something to interpret this mess.
>> If an aggregator does not, the resulting description text does simply
>> look ugly and is hard to read.
>
>  Have you taken a link at *any* of the feeds that are used on
> <http://planet.debian.net/>? They do *all* include escaped HTML tags in
> them.

I admit I don't use planet.debian.net at all, but the comparison is not
as striking for me as it appears to be for you.  Most feeds I use
do not use escaped html inside description tags.  In fact,
Debian Security (long) is the first one I encountered that does do that.
(I admit I didn't used much more then the W3C's and the RFC feed up until now.
 Slashdot and LWN not having a <description> field at all do not count here.)

> And I think it is a good thing.

I'd rather prefer if the goal (which indeed is a good thing) were implemented
in a correct way.  Given that RSS is already XML, and there are ways
to define RSS extension modules, I simply think there is no need to
bloat the existing <description> field with possibly unreadable cruft.

>>>  No, please not. From what I understand it HTML is allowed in there if
>>> it is encoded as entities.
>> 
>> I continue quoting from the same page:
>> 
>> "      If you need to include a a tag in the text of the feed (e.g.,
>>        the title of an item is "Ode to <title>"), make sure you escape
>>        ampersands and angle brackets (so that it would be "Ode to
>>        &lt;title&gt;")."
>
>  And this isn't done. Those tags _are_ escaped, thank you.

The point is that we are talking about special characters like <, >, " and
the like, these need to be escaped in case they are part of the description
text so that XML parsing doesn't break.  This does not mean that they should
be used to embed other markup languages.
You critiqued the fact that my patch left this in.  I was simply observing
that this is not a problem but a necessarity.

>> However, this is not saying "Use ordinary html markup to identify links
>> and paragraphs".
>
>  And it doesn't say the contrary, like you insist.

Actually, if you had read both quotes I pasted, it does say that.
Granted, it does not forbid it explicitly either, but I guess what we are
talking here is common sense.  Of course you can embed all kinds of markup into
the description by using &lt;img src=&quot;http://traffic.net&quot;&gt; and
the-like, but this is in contrary to what the description field
was originally ment for.  The document I quoted simply tries to emphasis this,
and calls on people to avoid it, to prevent the mess from growing.

>> The problem is that some aggregators might be able to parse escaped HTML
>> markup, but it is simply not specified in the RSS standard, and so, aggregators
>> are not required too.
>
>  Maybe another plaintext feed helps, then. But I am still not convinced
> that this is something that rss wasn't meant to offer, sorry.

I don't think that a separate plaintext feed is the correct way to go
about this.  I am still convinced that the original RSS 1.0 <description>
tag was ment to be used for plain text only.
I always thought that Debian should set an example when it comes
to following sensible guidelines when implementing new technology.
Nevertheless, I've meanwhile found a way to convince my aggregator
to "strip" this unnecessary markup (for reference, in Gnus summary buffer,
use `W h' to "wash html"[1].)
However, I still think that the current implementation is the wrong
way to go, since it requires perfectly standard compliant RSS 1.0
clients to contain either a complete escaped html paser, or
strip common html tags, to be able to present the actual information
in a meaningful way to the user, which looks like a completely broken
approach to me.

I'd like to leave this bug open for further discussion on this matter if
you don't mind.  After all, it is just wishlist.

[1] Something which is also used when one gets a HTML formatted e-mail
without proper MIME type information.  This strikes me as a very nice
cmparison to further illustrate my case.  Such e-Mails are also broken in
a sense, still, one could argue that clients only need to parse the content
correctly.

-- 
CYa,
  Mario | Debian Developer <URL:http://debian.org/>
        | Get my public key via finger mlang@db.debian.org
        | 1024D/7FC1A0854909BCCDBE6C102DDFFC022A6B113E44

Attachment: pgp3WiQznMx8q.pgp
Description: PGP signature


Reply to: