PROPOSAL: SGML & XML two part proposal overview
I would like to bring forward a two part proposal, submitted by Eric
Bischoff, from an SGML & XML workgroup that has spent a lot of time and
energy on their draft. Enclosed is an introduction/overview of the two
part proposal. Following this email, the first part will be sent to the
FHS and the second part will be sent to the LSB.
This document explains the choices that have been made for the
recommendations to LSB about DocBook. This is presented in a separate
document to leave the draft concentrate only on the "what?" and not on
There are hundreds of man-hours behind those recommendations. They really
costed blood, sweat and tears. Each line was discussed many times and
the global architecture changed quite often. We really tried to hear
what everyone add to say. So we would like to encourage LSB in being
very careful if they want to modify them.
The general philosophy was to keep the "historical" choices everywhere
it had no consequences, and the "best" technical choice wherever it was
interesting. We have attempted to design a very simple but also powerful
architecture, in full respect of the FHS (File system Hierarchy Standard).
Another general principal of design was to think to the user, not to
the theory. There were many models that would have been much more
intellectually satisfactory - but they were all too complex for
Why those definitions? Because we realized we were speaking about
different things with the same words. An example is "SGML application":
it can both refer to a specific DTD, or to a computer program meant to
process some SGML or XML document, both definitions being perfectly
correct. To avoid any potential confusion, we just chose the one
Some definitions like "helper", "backend" and "frontend" are not even
necessary to read the rest of the document. We left them because we
needed them to provide a reference implementation.
R001 - SGML Directory layout
Some existing projects were putting files in /usr/lib/sgml, some other
in /usr/local/share/sgml. Those files are not libraries nor local to a
system, so we chose /usr/share/sgml.
Some projects used to put centralized catalogs at the same place as the
other catalogs. Since they can be seen as system configuration files,
it was locgical to centralize them in /etc.
One very hard question was: should we separe sgml from xml?
The relationship between one and the other is very strong, so we chose
to keep them at the same place in the directory tree. This allows,
for example, to have all docbook stuff, both sgml and xml, at the very
same place, which is obviously practical.
While /usr/share/sgml does not explicitely reflect this, we found that
it was still better than /usr/share/markup (what about TeX then?),
than /usr/share/ml or than other proposals.
Why having fixed file paths while you could have got them from some
configuration variables, autoconf mechanisms, etc? First because it's
simpler: we wanted a very strong standard, given that the tools may
still use such configuration variables or autoconf mechanisms to adapt
to non-LSB platforms. We considered that a standard that does not specify
enough is somehow encouraging the most bizarre variations.
We chose a dtd-and-package-oriented architecture, instead of a
This was probably the most controversial issue. The "natural" proposal for
SGML and XML specialists is to have the FPIs map almost letter-per-letter
in the directory names. However, this approach does not take profit of
the catalogs mechanism that allow to map FPIs into file paths.
A file-type-oriented architecture would have lead things like:
or something more far away from the FPIs like:
but in all the case, the files would have been spread according to their
file types in distant directories. We would probably have had entities
somewhere, stylesheets somewhere else, dtds in a third place, and sgml
declarations in a fourth place. This would certainly have broke some
relative paths, and required more packaging work.
The user does not think in terms of file types, whereas SGML specialists
do. The user only thinks "I want to do some MathML" or "I want to do some
XHTML" or "I want to do some TEI". This is why the basic unit is the DTD.
This DTD-centered approach does not mean that first level directories are
for DTDs. It just means that they hold everything related to a given DTD:
stylesheets, enterprise-wide customizations, etc...
R002 - DocBook Directory layout
Maybe the document seems confused because it mixes recommendantions
for SGML and XML with recommendations for DocBook. It would somehow
have been good to separate it into two documents. On the other hand,
this allowed to think in very practical terms.
There is only one lower level of directories. The directory names are
vaguely defined as holding one "package". One advantage is that the
relation to any RPM or DEB package is very close. The other advantage
is that we have a very flat tree, thus easing both hacking, packaging
and maintenance by system administrators.
The lower level directories are version-numbered. This unusual naming
scheme is intended to permit documents that are written using several
versions of the same DTD to coexist on the same system.
R003 - Open Catalog usage for SGML
Why focusing so much on catalogs in these recommendations? Because
they are the key to your directory structure and give a strong working
infrastructure that every SGML or XML tool can count on.
Open catalogs have very often been resented because they lead to problems
like conflicting SGMLDECLs. However, those problems do not appear if
you use them carefully. One of the keys is to avoid putting everything
in the same bag, and to have centralized catalogs that are specific to
a given DTD.
The fact that they are DTD-specific has a number of advantages:
- avoid SGMLDECL conflicts without assuming DTDDECL or
DELEGATE support, which many tools still not support yet
- avoid duplicate FPI declarations
- allow to point to the right version of a given DTD
and to the corresponding entities and style sheets from only one place
When splitting your CATALOG pointers in one file per DTD, you also somehow
lose a global vision on all the catalogs that are installed on your
system. This is why we have introduced the super-catalog, pointing to all
the centralized catalogs on your system. It eases a lot scripting issues.
The super catalog may be used as a default centralized catalog, for
example when the DTD is not known, however it can't be guaranteed that
there won't be any declaration conflicts if an application chooses to
use this functionality.
OASIS says that all the catalogs should be named "CATALOG" or
"catalog". This was impossible to respect in /etc/sgml where you will
have the centralized catalogs, because many files cannot hold the same
name. Somehow it does not break those directives that much, because all
the ordinary catalogs on your system would still be named "catalog".
We also choose to specify "catalog" rathen than "CATALOG", while OASIS
leaves the choice. We considered that we should encourage one of both
versions, whichever it should be, because it made live simpler for
everyone (scripts, maintainers, packagers, tools authors, ...). In this
respect, LSB implementations could be considered as conformant to OASIS,
while the contrary would not be true.
R004 - Open Catalog usage for DocBook
Directories like the ones holding Jade's or OpenJade's declarations and
the ISO entities are on top level because they are not specific to any
given DTD and can be used by two or more of them.
Of course one may argue that Jade's or OpenJade's declarations contain the
document type definition of what DSSSL is. But again what is important
is the usage, not the formal definition, so it has no reason to go to
a dsssl/ directory (which would also encourage packagers to put the
stylesheets in, away from their dtd, which is not what we want).
R005 - Configuration files
This recommendation is voluntarily vague, to ease as much as possible the
possibility to create SGML applications with not creativity restrictions
with respect to configuration files - the catalog layout solves anyhow
one of their major problems: find the files.
R006 - Iso-entities
So far, the most confusion has been with the file names holding these
very basic character entities. We have seen the following naming schemes:
ISOamsa ISOamsb ...
ISOamsa.ent ISOamsb.ent ...
iso-amsa iso-amsb ...
iso-amsa.gml iso-amsb.gml ...
There was a similar confusion for the Formal public identifiers describing
these files. We have seen the following naming schemes:
"ISO 8879:1986//ENTITIES Added Math Symbols: Arrow Relations//EN"
"ISO 8879-1986//ENTITIES Added Math Symbols: Arrow Relations//EN"
Again, we chose to avoid deciding not to decide. We had a lot of feedback
from users suffering from this indecision. Even if technical workarounds
exist, we would like to encourage one of these forms to emerge.
R007 - Packages
We are very far from providing inter-distribution compatibility at the
package level, and it is likely that someone will get broken dependencies
if he/she mixes packages coming from different distributions.
This document will not try to fix package names nor proposed dependency
declarations for DocBook distributions. We however wanted to point out
a problem that may be encountered when packaging SGML or XML: the package