d-i manual: xml status report
Hi.
As the release of sarge is nearing, I wrote something like a status
report about its install manual at
http://www.debian.cz/~kurem/status.html
[Thanks to Denis Barbier for reading the first draft]
I'd like you to comment on technical details (like which processing
tools are available on debian machines? How much nonfree is fop?)
Dumping here relevant part from above url:
Technical side of the beast
As you may noticed, the new install manual for sarge is written
in XML version of DocBook, which is preferred to old SGML
DocBook. This switch had to be done at some point, because SGML
DocBook will be unsupported from its authors in the future. The
switch invoves several things:
1. We have to use proper xml tags instead of wild sgml
shortcuts.
2. We have no more marked selections.
3. We have to choose some new build system.
1. Proper xml tags
==================
Converting parts of the old b-f manual to the new one was done
on some semiautomatical basis and a lot of handwork. I can say,
that currently written parts of d-i manual are valid xml
DocBook. (At least xsltproc doesn't complain and I can build
html and pdf out of it (more about that later)). For newcomers,
there is nice introduction in
debian-installer/doc/manual/cheatsheet.xml.
2. No more marked selections
============================
Marked selections allowed us to branch actual text depending on
architecture we are building for as well as other conditions,
like:
Use <[ %s390; [ tapes ]]>
<[ %supports-floppy-boot; [ floppies ]]>
to boot the system.
These marked selections can be rewritten using profiling of xml
DocBook:
Use <phrase arch="s390">tapes</phrase>
<phrase condition="supports-floppy-boot">floppies</phrase>
to boot the system.
This is also done for currently written parts and it works fine,
as far as I can say.
The other thing is, that marked selections are used not only in
the text, but also in "metadata" definitions (infamous *.ent
files), which is much more tricky to get right. I rewrote these
definitions, so they work fine, but the source looks quite ugly
sometimes. This also means that there is no place for
lang-specific entities in *.ent files. These entities can be
(re)defined at the top of install.XX.xml file, if their
translators desire so.
And the last issue with profiling is a fact, that some entities
can't be profiled, because they go into some xml attribute like
url:
<ulink url="some-url/&architecture;">
which would expand to something forbidden like
<ulink url="some-url/<phrase arch='i368'>i368</phrase>
<phrase arch='m68k'>m68k</phrase>">
After some grepping through sources it seems that these
non-profilable entities are just &architecture;, &langext; and
&downloadable-file;. (Well, I didn't mention &releasename;, but
this is a non-issue, because it always holds only one value
common for all arches (I suppose everybody anxiously awaits
sarge, right?)). All non-profilable entities have to be handled
by the build system like shuffling symlinks or rewriting content
of some dynamic file before each arch build (done so in my proof
of concept build system).
3. Build system
===============
Due to the point 1. we have to use another set of tools to get
some .html, .txt, or .pdf output. Due to the point 2. we need
another way to pass profiling parameters to the processing
tools.
Let me start with the latter. Because we've lost marked
selections and conditional branching (<!ENTITY blah IGNORE> or
<!ENTITY blah INCLUDE>), we need to organize the *.ent files in
a slightly different way and push some information outside of
these files into the build script. (See my proof of concept
below).
Back to the former item: new toolchain. There are basicaly three
options:
1. Use xsl stylesheets and xsl processor (xsltproc, saxon) to
get nice .html and .fo (Formatting Objects). FO can be
transformed to various formats (.txt, .ps, .pdf) with fo
processor (fop, xmltex/passivetex).
2. Use dsssl styles and (open)jade to get .html and .rtf. When
we throw jadetex in, we can also get .pdf and .ps.
3. Use something like docbook2latex, but I don't consider this
as a viable alternative.
In general, xsl way is more modern and is The Way To Go(TM), the
glory of dsssl is fading. On the other hand saxon and fop are
java programs, so they depend on "non-free" java (don't know if
they work with gcj or kaffe). Xsltproc does its work
marvelously, but in stable it dies hard on profiling (you have
to use testing/unstable version). I do also recommend to use xsl
styles from testing/unstable. I've heard some bad things about
passivetex, but I can't confirm that myself, because it is not
in stable and installation of unstable version fails in the
process. Fop has some quirks regarding accented characters and
line layout is far away from TeX output we are used to.
Conclusion for technical side:
==============================
I did some proof of concept build system, which I use to verify
my work on document xml structure. You can download it from
http://www.debian.cz/~kurem/build.tar.gz.
It consists of updated *.ent files according to point 2., file
build.sh, which calls script buildone.sh for each language and
architecture to build. (Here can go some management code like
moving just-built doc to some safer location...). Buildone.sh
sets up profiling and calls the right tools (change various
variables inside to suit your needs). There are also three
style-*.xsl files, which can be used to customize output. Just
grab current debian-installer/doc/manual/en, drop it into the
unpacked directory and run e.g. ./buildone.sh powerpc en.
Used toolchain consists of xsltproc (.html and .fo) and fop
(.pdf).
I'm not a DD, so I'd like to hear your oppinion about this. I do
understand we will need some DD to write nice build system,
which will be flexible and much more dynamic, so don't bash my
coding style, but the overal idea.
--
Miroslav Kure
Reply to: