[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

HTML->XHTML transition [Re: I feel to complain again]

On Sun, Aug 05, 2007 at 02:03:10PM +0300, NAGY Viktor wrote:
> wojtekz@comp.waw.pl wrote:
>>  -<hr>
>> +<hr />
> I'm wondering if we have a proper plan for switching to xhtml, or this 
> markup polishing is just a substitute activity -- in the sense that yes, we 
> know that xhtml requires closing all elements so let's close them 
> occasionally but we don't really see how and when we can switch to xhtml.

I'd like to know this too! The above change is bad IMHO because it means 
the page is *neither* valid XHTML (wrong doctype) nor valid HTML 4 strict 
(see explanation below [0]).

Jutta invested a lot of work into making the pages validate, so unless 
someone really wants to convert the entire site within the next weeks 
(including changing the doctype), such problematic changes should not be 

BTW, AFAIK "<br></br>" is fine for both SGML and XHTML standards-wise. (The 
W3C _recommends_ to use <br/> in this case, but does not mandate it, see 
<http://www.w3.org/TR/REC-xml/#IDAK0FS>.) But would advocate moving to 
XHTML directly, instead of using it as a workaround.



The W3C validator will actually output that the page *is* valid HTML 4 
strict, but that is not really correct: <br/> is not a valid SGML tag. The 
reason why the W3C validator will not flag it as an error lies in the 
ugliness of SGML. SGML supports so-called net tags, a shorthand which 
allows you to write this:
  <p/This is the content of the paragraph./
instead of this:
  <p>This is the content of the paragraph.</p>

So, to the validator the character sequence "<br/>" means "a <br> tag, 
whose content follows after the '/', the content being the character '>'." 
(Apparently a newline is allowed instead of the second '/'? I don't know.)

However, <br> is defined as having empty content, so as soon as the '>' 
following the <br> tag is seen, the SGML parser implicitly adds a closing 
</br>. As a result, in SGML the characters "<br/>" are equivalent to 
"<br></br>&gt;" - not what you usually mean when you write "<br/>"!! :-(

See <http://www.cs.tut.fi/~jkorpela/html/empty.html> for a great in-depth 
explanation. However, the details are horrible, you probably don't want to 
bother trying to understand them! ;-)



  __   _
  |_) /|  Richard Atterer     |  GnuPG key: 888354F7
  | \/¯|  http://atterer.net  |  08A9 7B7D 3D13 3EF2 3D25  D157 79E6 F6DC 8883 54F7
  ¯ '` ¯

Reply to: