[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Brokenness of DocBook XSL toolchain

Hash: SHA1

/ Aaron Isotton <aaron@isotton.com> was heard to say:
| The transformations work as follows:
| - XML -> FO using
| /usr/share/sgml/docbook/stylesheet/xsl/nwalsh/fo/docbook.xsl.  This part
| works fine, BUT it doesn't respect /etc/papersize.  xmlto has a hack in
| it to do that, but other problems (follow).

There's no obvious way to make the base stylesheets respect
/etc/papersize. The papersize can be passed as a parameter, and that
parameter can come from /etc/papersize, but the stylesheet itself
can't look there. (XSL stylesheets can't load non-XML documents.)

| - FO -> PDF using libfop-java.  Looked ugly and didn't work properly
| last time I tried, and depends on a bunch of packages which are *not* in
| Debian.

FOP is...problematic.

| - FO -> PDF using xmltex.  This seems to work nicely, except it gives
| dozens of warnings of the kind 
| Font shape `U/msb/m/n' in size <4.04994> not available
| (Font)              size <5> substituted on input line 208
| which seem to have no bad effects, though.

TeX can also be tricky to setup.

| - FO -> DVI using xmltex.  One might think that this should work the
| same as FO -> PDF, but this is not the case.  It works with toy "sample"
| documents, but not with longer and more complex ones.

Using TeX to read FO and produce PDF is an interesting exercise, but I
don't think it's being actively developed. As far as free FO
formatters go, I'm holding my breath for
https://sourceforge.net/projects/xmlroff/ (Fair disclosure: I work for
Sun. But I'd still support Tony even if I didn't.)

| Now to xmlto.  xmlto (should) automate this entire process by calling
| the appropriate helpers, deleting temp files etc.  It uses a hack to
| respect /etc/papersize, but entities like &mdash; or even more important
| ones such as &auml; do not work.  As almost every serious XML documents
| contains some entities at some point, it's basically unusable to create
| PDF.

How can xmlto have any impact on the use of entities in your document?
Do you have a <!DOCTYPE declaration? If you do, you should get
entities. If you don't, well, then it's working as expected :-)

| The problems - as far as I can see - are the following.  
| The transformation is very complex; it is distributed onto several
| packages: docbook-xsl, xsltproc (or some other xslt processor), xmltex,
| passivetex, tetex-bin, and optionally xmlto.  

Yeah. I'm working on becoming a Debian developer and I'd be happy to
help arrange things so that they're easier.

| Nobody really seems to know where the errors originate, and the
| maintainers seem keeen to reassign and close reported bugs just because
| it worked on their machine with some sample document.

Hmm. I don't believe that describes how I handle bugs, but apologies
if I have.

| Many of these problems do not appear with small samples, and it is thus
| difficult to track them down.

Too true.

| I'd very much appreciate if somebody with more knowledge of xslt and tex
| than I have could look into the problem; I also think that it would be
| much better if the maintainers of the relevant packages would check the
| toolchain with some "real" documents (as available from
| http://cvs.debian.org/*checkout*/?cvsroot=debian-doc) instead of some
| upstream-supplied sample documents.

And therein lies part of the problem. While I accept responsibility
for the stylesheets and I'll do my part to make them work better for
Debian, I can't test the whole toolchain.

I wouldn't be surprised if all of the maintainers are in a similar

                                        Be seeing you,

- -- 
Norman Walsh <ndw@nwalsh.com> | Are you not the future of all the
http://nwalsh.com/            | memories stored within you? The future
                              | of the past?--Valéry
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>


Reply to: