[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#98811: about XHTML compliance



On Mon, Dec 17, 2001 at 10:51:12AM +0100, Norbert Bottlaender-Prier wrote:
> 17/12/2001 04:36:03, "James A. Treacy" <treacy@debian.org> a écrit:
> >...
> >Should the first line be change:
> ><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> 
> HI James,
> 
> As the xhtml 1.0 specificatios apply no extension to the html 4.01 specs  
> but rather some restriction, all xhtml 1.0 docs are supposed to comply 
> to html 4.01 as well, so there's no need to apply this change 
> immediately.

Hi Norbert,

xhtml 1.0 documents are not html 4.01 compliant, check any xhtml 1.0 document
with validator.w3.org by selecting html 4.01 dtd.
For instance, trailing slash in empty elements are invalid with html 4.01.

> I think the best way to proceed is first to apply the four changes a 
> through d, then pass a representative selection of thus built files 
> through the w3c validator http://validator.w3.org/file-upload.html (or 
> some other program)... (specify an xhtml 1.0 doctype input manually) and 
> if then there are no errors, you can change the doctype declaration.

I would like to clarify an important point: (a)-(d) changes do not affect
generated HTML pages until WML is run with some special flags.

My proposal was to commit these changes in template files to allow further
tests (without changing their doctype).  When they are done, people interested
in migrating to XHTML 1.0 could locally build XHTML 1.0 pages and fix broken
pages (i.e. add </p>, quotes, trailing slash,...).  These pages would continue
to be build on klecker.d.o. with HTML 4.01 doctype.
When all pages are done, we would have a good idea on pros and cons of
providing XHTML 1.0 documents, and then decide to switch or not.

Here are some precisions about differences between HTML and XHTML, and changes
needed in input files in order to let WML pass 2 (a.k.a. mp4h) produce both
HTML and XHTML files, depending upon command-line flags.
First of all, I would like to remind that mp4h is a text-processor, it has
absolutely no knowledge of HTML specs, and consider HTML tags as unknown.

 1. Capitalization
  * In HTML 4, tag names and attributes are case-insensitive
  * In XHTML, tag names and attributes are case sensitive and
    must be written in lowercase letters.
    Note: the DOCTYPE line is not an element, and remains capitalized.
  * By default, mp4h tags are case-insensitive.

Conclusion: tag names and attributes must be written in lowercase letters.
  There is no need to change user-defined macros, because they are expanded
  by mp4h and won't appear in output files.

 2. Optional end tags and empty elements
  * In HTML 4, some end tags are optional (e.g. </p>, </li>, ...)
    or forbidden (</img>, </hr>, ...)
  * In XHTML, start and end tags are mandatory, and there is a special
    construct for elements without body: <img src="logo.gif" alt="Logo" />
    Appendix C of the XHTML spec recommends to add an extra space before the
    trailing slash to improve compatibility with old browsers.
  * There is no optional end tags with mp4h, it only knows simple (i.e.
    without end tag) and complex tags.  By default, mp4h treats unknown tags
    as simple tags.  When processed with --expansion=0 flag, unknown tags
    are treated as complex tags unless they contain a trailing slash.

Conclusion: always write HTML end tags even if they are optional.  This
  requires some attention, because misplaced </p> or other end tag could
  produce an invalid document.  When in doubt, check with a validator.
  Always add a trailing slash in empty elements, either HTML tags or defined
  macros; it is a no-op by default when generating HTML files (i.e. it does
  not appear in output files), and tells mp4h parser not to search for an end
  tag when --expansion=0 flag has been set (it then appears in output files).

 3. Quotes
  * In HTML 4, quotes in attributes are sometimes optional
  * In XHTML, they are mandatory
  * Mp4h preserves quotes for unknown tags

Conclusion: quotes have to be explicitly written in HTML tags.
  There is no need to change user-defined macros, because they are expanded
  by mp4h and won't appear in output files.

Denis



Reply to: