[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#212249: docbook2html: not accepting \n as whitespace



Package: docbook-utils
Version: 0.6.12-2
Severity: normal

I experience a strange misfeature when generating HTML code from
docbook.  The parser do not treat newline as whitespace, and seem to
include it in the HTML file.  I made a small example to demonstrate
the problem.  I believe these two XML files should be generate the
same result:

File 1:

<?xml version="1.0" encoding="ASCII"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"; []>
<book lang="en">
  <bookinfo>
    <title>T</title>
  </bookinfo>
  <chapter>
    <title>T1</title>
    <sect1>
      <title>T12</title>
      <para></para>
      <para>P</para>
      <para>P1</para>
      <para>P12</para>
      <para>P123</para>
    </sect1>
    <sect1>
      <title>T123</title>
      <para>P1234</para>
    </sect1>
  </chapter>
</book>

File 2:

<?xml version="1.0" encoding="ASCII"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"; []>
<book lang="en">
  <bookinfo>
    <title>
T
    </title>
  </bookinfo>
  <chapter>
    <title>
T1
    </title>
    <sect1>
      <title>
T12
      </title>
      <para>
      </para>
      <para>
P
      </para>
      <para>
P1
      </para>
      <para>
P12
      </para>
      <para>
P123
      </para>
    </sect1>
    <sect1>
      <title>
T123
      </title>
      <para>
P1234
      </para>
    </sect1>
  </chapter>
</book>

The only difference is the newline between the tags and the content.
When generating HTML from these two sources using 'docbook2html
--nochunks', the HTML code have differences like this:

--- test.en.html        2003-09-13 23:41:04.000000000 +0000
+++ test.en2.html       2003-09-13 23:41:23.000000000 +0000
@@ -2,7 +2,8 @@
 <HTML
 ><HEAD
 ><TITLE
->T</TITLE
+>&#13;T
+    </TITLE
 ><META
 NAME="GENERATOR"
 CONTENT="Modular DocBook HTML Stylesheet Version 1.7"></HEAD

Notice the extra '&#13;' inserted in front of the book title.  Why is
this so?  Is it a bug in the parser, or something else?

-- System Information
Debian Release: 3.0
Architecture: i386
Kernel: Linux minerva.hungry.com 2.4.19-386 #1 Mon Nov 18 21:50:03 EST 2002 i686
Locale: LANG=no_NO, LC_CTYPE=no_NO

Versions of packages docbook-utils depends on:
ii  docbook-dsssl            1.76-1          Modular DocBook DSSSL stylesheets,
ii  jadetex                  3.12-2          LaTeX macros for SGML to DVI/PS/PD
ii  links                    0.96.20020409-2 Character mode WWW browser
ii  lynx                     2.8.4.1b-3.2    Text-mode WWW Browser
ii  perl                     5.6.1-8.3       Larry Wall's Practical Extraction
ii  sgmlspl                  1.03ii-20       SGMLS-based example Perl script fo
ii  sp                       1.3.4-1.2.1-28  James Clark's SGML parsing tools



Reply to: