[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: End of Documentation Discussion


I kinda had noticed that some packages were going to HTML, but with a
new job and a pending move, I haven't been reading these lists as
closely as I probably should.

There are several VERY SERIOUS problems with making everything HTML:

 * HTML cannot do very much with formatting.  When a printed reference
   is needed, PostScript, SGML/LinuxDoc, TeX, etc. is MUCH, MUCH
   better.  And, keep in mind that TeX and SGML/LinuxDoc can be
   converted to HTML on-the-fly if somebody writes a simple CGI
 * HTML cannot be easily printed.  Things get split about into many
   files, there is no printed index, page numbers, etc.
 * HTML cannot be easily grepped.  You can get links, etc. in your
   grep instead of just plain text.
 * Few packages come with HTML docs already
 * HTML is not easily searchable at this time.  At best, you can
   search the current page.  Otherwise, you have to allocate lots of
   disk space to some sort of index.
 * Some sites do not give users access to Web brosers.  For instance,
   one of my employers runs machines with dialup access with a limited
   number of lines.  People were calling in and using Lynx to browse
   the web, taking valuable dialup resources away from those that had
   work to do.  So Lynx was removed.  Other cases where this has
   happened include places where people have been abusing web access
   at the workplace.

I think that other formats have the following problems:
 * PostScript makes very nice printed output, but it difficult to
   search and requires a fairly expensive graphical monitor to be
   able to read on-screen reasonably.
 * LaTeX also makes nice printed output and can be converted to
   HTML as well as other formats, but such conversion on-the-fly is
   not practical due to the huge size of the LaTeX system.
 * GNU Info has an awkward interface and is difficult to search.
   It is also nearly impossible to print an entire manual from the
   files in the info directory.
 * Manpages are portable, searchable, and produce nice printed ouput
   with man -t.  However, for very long manuals, they are not

I would suggest either of the following:
 * DVI format.  It can be converted to HTML (I think...) and plain
   text on-the-fly.  It can also be converted to PostScript and
   have very nice printed documentation.  Several programs generate
   DVI and there are also viewers.  When converted to text format,
   DVI is easy to search; however, DVI is not natively very
   searchable.  Downside: conversion to PostScript requires
   significant disk resources (fonts!) and can be a lengthy process.
   On second thought, maybe DVI isn't the best choice... :-)

 * LinuxDoc/SGML.  This is probably the best choice.  It converts to
   HTML very nicely.  It can also be converted to PostScript (via
   LaTeX tools) and so a nice printed output can be obtained.  It
   can also be converted to ASCII text, and so can be searched
   without too much difficulty (but would be better if it was
   natively searchable).  LinuxDoc also has the advantage that
   there is a powerful psuedo-WYSIWYG editor (LyX) that can generate
   LinuxDoc output.  LinuxDoc tools are not so huge as TeX and so
   a system that does not need to bother with PostScript files
   would not need to install a large system.

   I believe that there are also LinuxDoc to Info converters as
   well as LinuxDoc to Manpage converters, but I could be mistaken.

Therefore, I propose that we rewrite portions of your document as

All packages should ideally provide manpages (although there are a few
exceptions).  Packages providing additional documentation should use
GNU info format or LinuxDoc/SGML format.  There should be a script or
program available to convert SGML to HTML on-the-fly (shouldn't be
hard since we already have the tools to do that).  Various other
documentation provided by the upstream author should be converted to
SGML if possible; if not, it should be included untouched.

(this is, of course, a draft proposal off the top of my head and could
no doubt use a lot of work yet...)

Just to summarize: I believe that HTML is a VERY BAD choice for
unification of documentation for the reasons outlined above.  I
actually am not sure that we should necessarily do much beyond plain
text, since anything past there requires additional resources that are
not necessarily available on all systems.  One of the most powerful
things about Debian is that it runs even on the oldest of hardware.
It would not be good if we require people to install X just to read

Christian Schwarz <schwarz@monet.m.isar.de> writes:

> Hi folks!
> I think we should came to an end of that _silly_ "Documentation Policy"
> discussion. Most people here are discussing topics that are already
> decided and topics that would need to be discussed are forgotten.
> So I'll make another proposal. This is meant to be a "compromise" that
> everyone here should be able to accept. I will _NOT_ accept simple
> "objections" this times. If you can't live with this proposal, you'll have
> to present another formulation of a paragraph or of the whole text. 
> Note, that this is _NOT_ the actual text that will be included in the
> Policy Manual. It's meant to contain the "facts" the new policy will be
> based on. When we have a consensus about that, I'll present the necessary
> policy changes here.
> Here is my proposal:
> ---------------------
> The unification of Debian documentation is being carried out via HTML.
> Thus, every documentation that is available in a format which can be
> converted into HTML, should be converted, with the exception of manual
> pages (they can be converted via dwww at run-time) and source code
> examples.
> In case of converted HTML documentation, the files with original mark up
> format should not be provided, unless they are considered as "example
> documents" for the mark up language.
> Packages that contain programs with GNU info manuals, should provide these
> in HTML _and_ in GNU info format. The HTML files should be stored in
> the directory
> 	/usr/doc/<pkg-name>/html-info/
> since the new package management system (deity) will be able to identify
> these files as info-converted HTML files, which may be removed by the
> local sysadmin.
> All documentation related files will be kept in the "main binary package"
> if they do not exceed 500 kbytes installed size together. (Of course,
> documentation-only packages are not covered by this rule.)
> ---------------------
> One questions remains: Is it possible to browse "html.gz" files _without_
> a CGI script with the usual HTML browsers (Netscape, lynx)? If so, we'll
> make it policy to gzip all html files and to adopt the references. If not,
> we'll have to install all html files gezipped--or add a cgi capable web
> server to the base system.
> Note, that I checked all packages in "hamm/main" against these rules.
> We'll get about 26 new packages. (I added the installed file size of the
> files /usr/doc/* and twice the size of /usr/info/*, since these documents
> will be translated into HTML too. If this sum is greater than 500kbytes, a
> new packages has to be set up.)
> Here is a list of packages (hopefully, all non-doc packages :), that would
> have to be splitted. The syntax is:
>      <package-filename>: <total-doc-size> (doc <size of /usr/doc/*>,
>                                            info <size of /usr/info/*>)
> Here is the list:
> ./devel/cvs_1.9-4.deb: 625 (doc 433,info 96)
> ./devel/ddd_2.1-3.deb: 700 (doc 700,info 0)
> ./devel/g77_0.5.20-1.deb: 510 (doc 42,info 234)
> ./devel/binutils_2.8.1-1.deb: 553 (doc 189,info 182)
> ./devel/libfcgi1-dev_1.5.1-1.deb: 564 (doc 564,info 0)
> ./devel/slib_2a6-1.deb: 672 (doc 482,info 95)
> ./devel/doc++_3.01-1.deb: 692 (doc 692,info 0)
>         -- the package includes a large ungezipped PostScript file!
> ./devel/gcc_2.7.2.2-4.deb: 807 (doc 79,info 364)
> ./devel/libg++27-dev_2.7.2.1-9.deb: 637 (doc 435,info 101)
> ./devel/libg++272-dev_2.7.2.5-1.deb: 637 (doc 435,info 101)
> ./editors/emacs_19.34-11.deb: 1830 (doc 42,info 894)
> ./editors/xemacs19-support_19.15-3.deb: 4654 (doc 0,info 2327)
> ./games/xconq_7.1.0-3.deb: 998 (doc 998,info 0)
> ./graphics/povray-misc_3.0.10-2.deb: 969 (doc 969,info 0)
> ./graphics/ucbmpeg_1r2-2.deb: 1512 (doc 1512,info 0)
> ./interpreters/gclinfo_2.2-4.deb: 1318 (doc 0,info 659)
> ./interpreters/scm_4e6-2.deb: 566 (doc 410,info 78)
> ./mail/mhonarc_2.0.1-1.deb: 793 (doc 793,info 0)
> ./math/calc_2.02f-1.deb: 980 (doc 36,info 472)
> ./math/gnuplot_3.5beta6.328-2.deb: 796 (doc 660,info 68)
> ./tex/latex2html_96.1.h-6.deb: 651 (doc 651,info 0)
> ./tex/tetex-base_0.4pl8-2.deb: 647 (doc 3,info 322)
> ./text/lout_3.08-1.deb: 3517 (doc 3517,info 0)
> ./utils/ftape-2.0.30_3.03a-1.deb: 581 (doc 473,info 54)
> ./web/arena_1.0b3-1.deb: 701 (doc 701,info 0)
> ./x11/9term_1.6.6-3.deb: 551 (doc 551,info 0)
> Thanks,
> Chris
> --                  Christian Schwarz
>                    schwarz@monet.m.isar.de, schwarz@schwarz-online.com
>                   schwarz@debian.org, schwarz@mathematik.tu-muenchen.de
>                 PGP-fp: 8F 61 EB 6D CF 23 CA D7  34 05 14 5C C8 DC 22 BA
>  CS Software goes online! Visit our new home page at
>  	                                     http://www.schwarz-online.com
> --
> TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
> debian-devel-request@lists.debian.org . 
> Trouble?  e-mail to templin@bucknell.edu .

John Goerzen          | Running Debian GNU/Linux (www.debian.org)
Custom Programming    | 
jgoerzen@complete.org | 

TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-devel-request@lists.debian.org . 
Trouble?  e-mail to templin@bucknell.edu .

Reply to: