Re: Documentation Policy
Bruce Perens wrote:
> From: "Scott K. Ellis" <email@example.com>
> > I have no problem when HTML is the provided upstream documentation source,
> > and don't want to cripple my ability to read that. However, when the
> > upstream source is something else, such as info/texinfo, I don't want HTML
> > as well.
> Well, you're going to need a script to implement that policy. Probably
> the best way to handle this is to provide a way to tell the package system
> that you have deliberately removed a file, and that this file should not
> be replaced. I wouldn't expect this in version 1.0 .
Your answer sounds as if you think this is a particular problem and not
a general one. I think this issue affects a considerable proportion of
Debian users and therefore a more general solution should be provided.
In my opinion, the current policy on documentation is inadequate because
of the following very important points:
*) Forces users to waste bandwidth and disk space by not allowing
an easy way to select which documents to keep (and move).
*) It is inconsistent: treats some source formats (man) as acceptable for
binary packages and some other (texinfo) as not acceptable. I do not
have any use of groff other than for viewing man pages. Groff and
texinfo have equivalent functionality but they are treated differently.
*) It is not flexible: The packaging system should prevent users from
installing mutually conflicting packages but it should allow them
to install any document at all.
*) It is incomplete. The policy says that the users must be able to
view any document with an HTML browser, but it does not specify a
default method (i.e. a default web server that should be installed
as part of the base system and a default browser, part of the base
If we compare what is necessary to view a man page (and nobody never complains)
then we can understand that something similar might be necessary for other
types of documentation.
A man page is in source format. We need a compiler (groff), a caching manpage
server (man) and a viewer (less). They are usually in every system by
default and nobody complains that, for example, an alternative less
bloated manpage compiler should be used.
I suggest the following changes to the policy:
[Note: I use base system to mean the basic default system, not just the
packages in "base"]
*) The default format should be HTML, but everything necessary to
view the documents should be provided as part of the base system.
This includes a default HTML viewer (lynx), which users can override
by means of the update-alternatives method. It would also include
a default HTML server, which should also be installed as part of the
base system and changeable by the update-alternatives method (or
maybe conflicting with others?).
Suggestions for HTML server: boa, small and fast
cern, if you want caching
*) The package dwww should be marked important. It should provide
on-the-fly converters (as CGI programs) for as many formats as
possible. No converter should depend on a non-required package.
They should be self-contained or dependent on required packages only.
*) A default searching/indexing engine should be chosen. It would be
marked standard, but not important. (I don't know which one is good,
maybe Bruce's idea of shell+zgrep can be made into a package)
*) Documents should be provided in the least processed (closest to
the original source) format for which an on-the-fly converter exists.
Given the choice of several formats, the most versatile one (which
can more easily be converted to other formats) should be chosen.
*) Until there is a better option available, dpkg should include a script
to automate the process of unpacking the /usr/doc/package part of a
package without installing it. This is to allow users to install
documentation of packages which conflict with an installed one.
Users might need to manually remove the directory /usr/doc/package
when they no longer need it.
Scripts to register and unregister documents with dwww should be
provided in order to properly handle this case
*) The project should stick to the policy and not include alternative
formats or viewers by default. If we are convinced that HTML is the
format, then we must show it.
*) Man pages should be installed in raw format and converted to HTML
on-the-fly. Since a man->html converter which does not depend on groff
is possible, neither "groff" nor "man" should be installed by
default. Use of the "man" program should be discouraged. (Should we
still insist on having man pages for any program? For Unix compatibility
maybe. But man pages is one thing and the man program is another one.)
*) Texinfo sources should be installed in raw format and not in info
format. "Info" should be an optional package, which on installation
scans /usr/doc looking for texinfo files, compiles them and places
the output in /usr/info. On deinstallation it should erase /usr/info.
This means that "info" should depend on "texinfo" and would be
needed for organizing info files even for emacs users. Emacs should not
provide info. (Emacs could be used as info browser, though)
Besides, dwww should have a hook for calling the info installer (if
present) when some texinfo document is registered.
If info is not installed, "info package" would invoke the regular
texinfo->html on-the-fly converter.
(Maybe some Perl hacking is needed to convert texinfo2html
from a static compiler into an on-the-fly cgi translator, but it
should not be difficult)
*) Tex, sgml, groff, html and plaintext sources should be installed in
raw format since they either can easily be converted to html or
are somehow viewable with an HTML browser. It would be nice to have
good-looking conversions, but functionality and format consistency
should be the main concern. A file explaining what packages are
needed to generate a specific output format from a given input format
should be included in the main Debian documentation. Scripts should
be provided within those packages to minimize user work if they need
to generate alternative formats.
Note: don't forget the ability of browsers to produce plaintext (lynx),
html or PostScript (Netscape) copies of cgi-generated pages.
*) Documents originally in binary format (PS, DVI, PDF, MS-WORD) for
which no conversion is possible should be provided separately. A file
explaining how to get the documentation (including which programs
the user will need: ghostscript, xdvi, MS Word) and a brief summary
of the document (developers _do_ read the documents, don't they?)
should be included in the binary package (in a convertible format, of
*) Similarly, documentation in any format which excedes a maximum size
should be included as a separate package. However, an overview of
what a package does and its basic functionality should be included
with the package, along with a reference to the rest of the
*) The README.debian file should be replaced by the index.html of the
/usr/doc/package directory. The file README.debian should contain
a text explaining how to use "doc package" for viewing an index
of the documents about a package. Similarly, we could consider
displaying that text when the user types "man package" or
"info package" and man or info are not installed. (or just fall
through to the html browser?)
I think all this is necessary if we really want to have HTML as the default
documentation format. If we chicken out of requiring a default browser and
a default server plus a set of cgi converters to be base packages, then
we should forget about having HTML as default and go with Chris idea
of having separate packages in different formats and let users choose.
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
Trouble? e-mail to firstname.lastname@example.org .