[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: document registration policy needing to be written

On 11 Apr 1998, Adam P. Harris wrote:

> Christian, it seems that we agree a lot,

Good! ;-)

> > I disagree. I still think that registering documents to install-docs does
> > make sense, even if dwww and dhelp share a common format:
> The first thing I want to state is that I fully agree we should have a
> small, thin, very simple "put my document into the registry" package.
> I feel this package should grow out of doc-base, and that it should
> *not* be coupled with any particular presentation or conversion system
> (dwww, dhelp, magic-doc-convert).  OTOH, I'm also trying to build
> bridges to Marco Budde, to bring him on board.  I feel a little
> tension between dhelp and doc-base, as if he doesn't feel doc-base has
> any right to be.  ;)

I wonder from which part you noticed the `tension'? :) But you're feeling
was right: there is a tention. I explain my position below. Perhaps we can
sort out some of the conflicts by discussing the problems frankly in this
public. (Hence the CC: to Marco--I want to make sure he gets this

(Please note, that the intention is not to make the diversity between
dhelp and doc-base, or Marco and me, larger! The intention is to talk
about the problems I see and to solve them! Remember, that we are all
`fighting on the same side', even if it doesn't always looks so.)

There is a major difference between dhelp and doc-base in the `development
philosphy': dhelp is a `one man project'. The initial version has been
designed and implemented by Marco without input of other developers
(AFAIK, at least, not publicly) since he saw a need for something `better
than dwww'. This position is perfectly alright--the procedure is _much_
faster than any other and you'll get good results a lot sooner. (See also
similar projects like deb-make, menu, lintian, etc.)

This style works fine with `standalone' programs, but doesn't work for
programs which need to cooperate with a lot of other packages--like in the
case of dhelp. Dhelp will only succeed, if packages support it. But in
order to get it supported (by at least, a majority of packages), this
would require an official policy statement. However, such a statement
would need the approval of all developers--something which is very
unlikely to happen since people didn't had a possibility to talk about the
design of dhelp first. (For example, I would have voted against the
SGML-like registry files.) 

In contrast, I've chosen a much harder way for doc-base: There was a huge
discussion about `Documentation Policy' on debian-devel (starting in June
1997), where I was desperately seeking for a compromise between the
different parties about how to design a new doc policy. After a few weeks
of a heavy email discussion, we finally had a compromise which (in my
eyes) pleased everybody's needs. The solution was `doc-base'. 

(Of course, dhelp and doc-base have different purposes. This is just to
explain the differences in development style.) 

Now, if you consider the time that was necessary to design and implement
dhelp with the time we needed for doc-base (nearly 10 months--and doc-base
isn't even close to be ready!) dhelp's style also has its advantages. But
in order to get doc-base supported by policy, the long way is necessary.

And this is also the reason why I'm so picky about doc-base's design: I
want to prevent the situation where we just implement doc-base in some
direction, `just to get something done', but work in a direction where
we'll never get a compromise between the developers.

(Again, let me stress that I don't say/think that dhelp's procedure isn't
good! But we can't follow this procedure with doc-base.) 

Comments are appreciated!

> [...]
> >   * Even if you say dhelp/dwww will handle only HTML while doc-base will 
> >     handle all other formats, doc-base is required: there are currently
> >     3 different HTML formats that have been requested by the users during
> >     the last doc policy discussion:
> Ah, I need to read this discussion.  Was it on <debian-doc> or
> <debian-policy> ?

It was on debian-devel (and possibly debian-doc), starting in Jun 97. I
remember that the discussion was very long and used several (long)
threads. (IIRC, it was the largest discussion I ever tried to "manage" as
policy manager--that's why I'm so carefully about not raising the same
discussion again ;-)

> > Of course, if you think install-docs would get too large if it does the
> > registry and format conversion, you could split the script into a
> > package-registry frontend and a conversion backend. 
> Yes, I think registry should be separate, but I'm pretty flexible.
> Until I get a more concrete idea of what the conversion infrastructure
> will look like, it's too early to decide.  Our architecting here
> should not rule out pinching off the registration system.


> > 2. When talking about filesystem structure, I'd suggest we check out the
> > new paths that will be required with FHS. (Debian will switch to FHS
> > soon.) Moving `registry' directories like /var/lib/dpkg is nearly
> > impossible (we'll not move this directory, for example) but this would be
> > required if we aimed 100% compliance with FHS (we'll not do). Therefore,
> > I'd suggest we use FHS paths right from the start.
> Agreed.  I think I'll move stuff now out of /usr/doc/<pkg>/.dhelp for
> doc-base installed packages.  I guess I wish I could just put it into
> /var/state/doc-base/dhelp-gen/<docid> rather than
> /var/state/doc-base/dhelp-gen/<docid>/.dhelp, but oh well.

Yes, these paths look ok. 

I'd suggest to leave the paths as they are now for hamm, but fix this in
the slink version only.

> [Technical diversion: doc-base has a listing of, for each docid,
> whether we were registered to dhelp or dwww.  Given that, I should be
> able to safely unlink the old /usr/doc/<pkg>/.dhelp files that
> doc-base created and move them over.  A pretty crucial bug in doc-base
> right now (only last HTML file in control file is registered) is
> another good reason to reconstruct our .dhelp files anyway.  Actually
> I think I'm going to add a flag to install-docs to refresh/reinstall
> all installed document ids.  Comments requested.]


Note, that with the old /usr/doc/<pkg>/.dhelp files it was necessary to
register all different docs (doc-ids) in a single .dhelp file. If dhelp
can parse other file names than .dhelp, it's possible to have a ".dhelp"
file for each doc--not only package.

> > 3. We'd also have to watch out which files we put below /usr and which
> > below /var. As a thumb rule, everything which is modified at installation
> > time _only_, can go into /usr. At run-time dynamically generated and
> > modified files must go into /var. Putting any dynamically generated files
> > into /var is also a good option since it would simplify the `purge'
> > process of doc-base. (That's important, see also #1 above.)
> Yes, that would be /var/state/doc-base or some such, from my reading
> of FHS.


> > >     * file format should be standardized, we should whip up a DTD and
> > >       make it true SGML; this will assist in format validation and
> > >       standardize file parsing
> > 
> > Oh, do you want to change doc-base's registry file format into a SGML
> > format?
> Yes, for slink, not for hamm.
> > I wouldn't like that for the following reasons:
> >
> >  * The format has to be supported by the package maintainers (only), so
> >    we should try to make life most easiest for them. The `dpkg style'
> >    control file syntax doc-base uses until know should be known to any
> >    developer already.
> I'm not ruling out backwards compatability.  As for the dpkg control
> file, I'm lobbying (perhaps unwisely) to get that put into SGML also!

I predict you wouldn't get a consensus about that. At least, I would vote
against a SGML-like syntax. SGML would make it much harder for other
scripts to parse dpkg control files (e.g., Lintian), and I don't see _any_
advantages from the move to SGML--only disadvantages. (But of course, feel
free to start the discussion if you want.)

> >  * AFAICS, we don't need SGML `functionality' in the registry files.
> Why not?  Wouldn't it be nice to be able to use 'nsgmls' to validate
> our control file at package build time, to make sure we're ok?  A
> could see it being nice to have a script that automatically transforms
> document control files into valid HTML or some printed report.

It would even be easier to write a Perl script which validates the
dpkg-style control files. 

No, nsgmls is no option. Remember, that doc-base will run on every system
out there. It has been designed to only require parts which are present in
the base system (more accurate: which are provided by Essential packages).
This is important, and if we stay with this, it might be possible to tag
doc-base Essential too (or at least, assigning it a high priority and
including it in the base system)!

> >  * Parsing SGML files is a lot more work and would require more CPU time
> >    at installation time, than to parse the simple dpkg control files.
> Yes, this is the crux, for me.  I actually don't rule out using either
> (a) a simple perl module wrapping around nsgmls in conjunction with a
> DTD, or (b) writing a perl module to parse SGML down to simple data
> structures (list of hashes comes to mind) on it's own in such as way
> as it has 98% SGML (or XML) coverage.  There may (should!) be a std
> CPAN Perl module for this, but I haven't found it yet.
> Benefits to SGML for control files you may not have thought of:
>   * decouple parsing system from the particular format.  I.e., we can
>     add fields without having to also mess with the parsing engine.

Umm, we can also add fields to the dpkg-style syntax without touching
the parser...

>   * allow features not allowed in control file type format, i.e.,
>     comments, multidimensional fields (attributes, i.e.,
>     'language="de"'), looping, cross-referencing within the control
>     format.

How often would these features be needed? Languages are the only thing
that will become important (AFAICS now), but this could easily be done
with the Dpkg-style too.

BTW, Lintian makes heavy use of the dpkg-style files and has done good so
far. I've written a general purpose parser of dpkg-style control files in
Perl--if you want, you could update this part in the doc-base source.

> I'm pretty flexible, however.  Our current (RFC 822-style) parsing is
> fine; if we stick with it, we oughta beef it up and make sure
> continuation lines are accepted everywhere, document a comment field,
> etc.  I haven't dug into that side too deeply; I know dpkg has some
> problems with continuation lines in some fields.

It's only dpkg who has the problem--not the file format. My new library
function parses continued lines correctly.

(BTW, the "Lintian info" is also given in dpkg-style control files. I use
a mixture between dpkg-style and SGML there: The outer syntax is the
dpkg-style, but within a field you could specify HTML/SGML-like tags to
support different output fonts, etc. I've also written a general style
Perl library to process such files and translate them either into HTML or
plain Text. Here is an example:

Tag: old-fsf-address-in-copyright-file
Type: error
Info: The /usr/doc/<i>pkg</i>/copyright file refers to the old postal
address of
 the Free Software Foundation (FSF). The new address is:
   Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
   MA 02111-1307, USA.

Tag: shlib-without-dependency-information
Type: warning
Info: The listed shared libraries don't include information about which
 other libraries the library was linked against. (When running `<tt>ldd
 foo.so</tt>' ldd should report about these other libraries. In your
 case, ldd just reports `statically linked'.)
 To fix this, you should explicitly specify the libraries which are
 used (e.g., `-lc') when building the shared library with `ld'.
 If you have questions about this, please contact &debdev;.

Please note the <tt> and <i> tags, the two-space indented text to specify
a <pre>-formatted text, and the &debdev; tag to insert general `text

I think this format merges the advantages of dpkg-style and SGML-style

  - _very_ easy to parse (only requires Perl and is fast)
  - easy to learn (.deb developers know dpkg-style and HTML-style already)
  - doesn't need additional software other than perl-base (included in
    the base system)
  - can easily be extended

> My guess is that when we start to reach consensus on the featureset
> for slink's document mgmt system, the right choice will be obvious.


> >  * Just using a SGML-like file syntax but by picky about where line breaks
> >    and spaces may appear (that's like dhelp behaves now) is even worse,
> >    since the SGML-like syntax makes the maintainers think doc-base doesn't
> >    care about spaces--a fail. (This happened to me with dhelp.)
> I agree.
> > >   * adopt the menu hierarchy as a standard documentation hierarchy (de
> > >     facto; make it official)
> > >     TODO if so:
> > >     * beef up a little, cf my Bug#20936
> > >     * consider how this hierarchy might integrate or not with language 
> > >       specifiers.  dhelp uses
> > 
> > Yes, that's an important point. Note, that people suggested before that we
> > use the same menu structure which is also used for the `application
> > menus'. However, I don't think this structure works well with
> > documentation.
> There are a number of divergances.  I would like to get Joost involved
> here and let's resolve this issue for hamm release if possible.

Right. But if you want to do this for hamm, you'd have to be _very_ quick.
I guess it's already too late for this... :-(

> FYI, I'm planning on adding some basic document heirarchy validation
> to doc-base (will only emit warnings for unknown/unregistered
> heirarchies.)




--                  Christian Schwarz
                   schwarz@monet.m.isar.de, schwarz@schwarz-online.com
                  schwarz@debian.org, schwarz@mathematik.tu-muenchen.de
                PGP-fp: 8F 61 EB 6D CF 23 CA D7  34 05 14 5C C8 DC 22 BA
 CS Software goes online! Visit our new home page at

To UNSUBSCRIBE, email to debian-doc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: