[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

`boa' vs `apache', running from x?inetd.conf, mime-types, mailcap



[can we start putting Keywords: headers in longer reports, for the
archive server? (it is smartlist, right?)]

(If you use Gnus, press {W e} now.)

----------------------------------------------------------------------

*Boa may not be our best option.*


_Gzipped HTML files:_

    Christian> One questions remains: Is it possible to browse
    Christian> "html.gz" files _without_ a CGI script with the usual
    Christian> HTML browsers (Netscape, lynx)? If so, we'll make it
    Christian> policy to gzip all html files and to adopt the
    Christian> references. If not, we'll have to install all html
    Christian> files gezipped--or add a cgi capable web server to the
    Christian> base system.

 With `boa', none of the links will need to be rewritten.  They can
point at ".../file.html", and the `boa' server will look for that.  If
it cannot find it, it tries again with a .gz extension tacked on, and
sends that if it's found.

*`apache' can be made to do this search/rewrite also.* It can be done
using the mod_rewrite, which I am reading about now.

 The browser's ability to view compressed html is dependant on correct
entries in "/etc/mailcap".  They all use that to find the program they
need to run to handle a particular MIME type.

from "/etc/mailcap":
# ----- User Section Begins ----- #
application/x-gzip; /bin/gunzip -c %s; test=true; description=GNU zip; nametemplate=%s.gz
application/x-compress; /bin/gunzip -c %s; test=true; description=UNIX compress; nametemplate=%s.Z
# -----  User Section Ends  ----- #
 This works fine with apache, (running from xinetd), but *NOT* with
boa.  With the identical mailcap, and the same file served from `boa',
I get an error dialog.  (Why?)  I tested with both Netscape 3 and W3.

 For some reason, serving gzipped files from `boa' (Version: 0.92-5)
does not work, given mailcap entries that function fine with apache
served documents.  Can anyone verify this?

 If we want to uncompress things /before/ transmission, it would be
simple for `dwww' or any other doc serving CGI engine to run things
through `gunzip', prior to sending the page to the browser.  The
better way to do this though might be to use the mod_actions in
Apache.  If you grab the apache-dev from "bo/source/web", and have a
look in the Configuration file, you'll find:

## The asis module implemented ".asis" file types, which allow the embedding
## of HTTP headers at the beginning of the document.  mod_imap handles internal 
## imagemaps (no more cgi-bin/imagemap/!).  mod_actions is used to specify 
## CGI scripts which act as "handlers" for particular files, for example to
## automatically convert every GIF to another file type.

# Module asis_module         mod_asis.o
# Module imap_module         mod_imap.o
Module action_module       mod_actions.o
... so it looks like it won't be too difficult to set up a perl script
that can turn .html.gz files into .html output to the browser as
text/html.  So it won't have to send application/x-gzip.  I don't
think that's the best solution, since sending gzipped data will save
bandwidth.

 *Now we need to figure out how to make it check for a file with a .gz
when one without was not found, the way `boa' does.* I am reading
about mod_rewrite; it looks like exactly what is needed.  It could
solve the doc teams problem with URL's in documentation, also, since
it can perform tests for the existance of a file.  (this must have
been discussed... I should look in the archive.)

 There can be .htaccess files in the /usr/doc directories.

_Running the server from `inetd' or, `xinetd'._

 * `boa' /cannot/ be run from `inetd' or `xinetd' in its present
incarnation.* It needs to have a configuration option added, like
`apache' has, so that it will exit after serving a series of requests
when it is launched from inetd.  Right now, it must run as a daemon.
I don't think there is a way to ask it not to answer requests from
outside an authorized realm, either, another advantage to running the
httpd with inetd.

 *Apache can be run in `inetd' mode*, by setting an option in the
configuration file, and adding an entry to "/etc/inetd.conf" or
"/etc/xinted.conf".  With the tcpwrapper or `xinetd', it is possible
to dissallow access to the server from outside your domain.  If you
are concerned about that, and many users will be, then *use xinetd*
and set the 'only-from' option for the WWW service.  *It works very
well.* Alternatively, use the tcpwrapper.

 Here's an example of the logging produced, given the following setup
in "/etc/xinetd.conf".
Jun 29 00:51:45 bittersweet xinetd[252]: START: www pid=25479 from=206.129.216.38
Jun 29 00:51:45 bittersweet xinetd[252]: START: ident pid=25480 from=206.129.216.38
Jun 29 00:51:45 bittersweet xinetd[25479]: USERID: www UNIX : karlheg
Jun 29 00:51:45 bittersweet xinetd[252]: EXIT: ident status=0 pid=25480 duration=0(sec)
Jun 29 00:51:46 bittersweet xinetd[252]: EXIT: www status=0 pid=25479 duration=1(sec)
service www
{
	socket_type	= stream
	protocol	= tcp
	wait		= no
	instances	= 8
	user		= www-data
	flags		= IDONLY
	only_from	= 127.0.0.1 206.129.216.38 206.129.216.1
	log_type	= SYSLOG daemon
	log_on_success	= PID HOST USERID EXIT DURATION
	log_on_failure	= HOST USERID
	server		= /usr/sbin/apache
}
 Perhaps we can put together a minimal Apache for the base set,
similar to how a minimal Perl is provided for it?  It could be
replaced by the full blown package at the installer's option.

 The minimal version could have the modules it needs staticly compiled
into it, or as modules, the way it's configured now.

    Fernando> A web server adds a lot of flexibility. Boa adds very
    Fernando> little overhead. Try it and only then say something.

 Try running Apache from `inetd' and see what you think of that, also.

 I set `top' to redisplay every second, and then hit:

http://localhost/doc/

 ... which is a very large directory. Apache starts right up, even
configured with dynamicly loading modules, and goes away again right
after it's done, freeing all of the resources it borrowed.  The disk
didn't seem to get hit any more than with `boa', from a purely
subjective standpoint.  (I have lots of RAM.  On a low-RAM system, it
would maybe page some.) The size of the processes are:

m     USER   PID %CPU %MEM   VSZ    RSS  TT STAT  START    TIME
_ root_      252  0.0  0.6  1040    420  ?  S    Jun_22    0:00 /usr/sbin/xinetd
_ www-data 25056  0.0  1.1  1232    756  ?  R     23:54    0:00  \_ apache

m     USER   PID %CPU %MEM   VSZ    RSS  TT STAT  START    TIME
_ www-data 25738  0.0  0.7   908    456  p9 S     01:54    0:00 /usr/sbin/boa

... here you can see that `boa' really isn't /that/ much smaller than
`apache', and must run all the time.  On a system with only 4Mb of
RAM, it will need an extra bit of swap space, and grind just a bit.
No big deal.  And when the page is served, it frees it up again.

Apache has several very nice features that make it our best choice.

 It works.  gzipped files served by apache are displayed by the
browsers I tested.

 You can put a HEADER.html and README.html file in a directory, and
apache will put those at the top and bottom of a server generated
index.

 It deals with content negotiation and languages other than
english. (does `boa' do that?  I only speak english.)

 It is the defacto industry standard server software.  Apache is very
popular.  The commercial `StrongHold' SSL server is based upon it.

 It is very flexible, modular, extensible, and quite configurable.  It
can do the URL rewriting (like how sendmail does address rewriting)
that will be required to link all of the SPI documentation.


    Fernando> In all cases mentioned, the overhead of on-the-fly
    Fernando> conversion is acceptable enough, just a little slower
    Fernando> than formatting man pages.

 And cached, like manual pages and `dwww' pages, with a cron job to
remove old ones.  The caching behaviour should be optional, for low
disk space setups.

    Fernando> 3) Documents in markup format for which no on-the-fly
    Fernando> conversion is available will be included in both
    Fernando> pre-processed HTML and original format. This is a last
    Fernando> resort measure.

 Agreed.

    Fernando> Original format should always be included. Reasons: 1)
    Fernando> To produce printed copies. 2) Because I hate
    Fernando> Ghostscript and Xdvi. I prefer reading the markup
    Fernando> directly (and I am not alone.) 3) Because users might
    Fernando> want to process the documents automatically (search
    Fernando> engines for example) HTML should be included so that
    Fernando> the documents are cleanly integrated with the rest of
    Fernando> the documentation and for serving them to remote (W95)
    Fernando> systems if necessary via http.

 `gv' and alladin ghostscript rules!  (And DEC SRC's Virtual Paper, if
you own a Zip drive or have plenty of hard disk.)

    Fernando> 4) Documents originally in binary format (PS, DVI, PDF,
    Fernando> MS-WORD) for which no conversion is possible should be
    Fernando> packaged separately. A file explaining how to get the
    Fernando> documentation (including which programs the user will
    Fernando> need: ghostscript, xdvi, MS Word) and a brief summary of
    Fernando> the document should be included in the binary package in
    Fernando> HTML format, or a convertible one.

 Ok with me...

    Fernando> Binary documents are useful mainly for printing and
    Fernando> they usually have a huge size to information ratio. I
    Fernando> hate storing junk in my systems. Online viewing of
    Fernando> binary documents is awful. I hate downloading a 1MB
    Fernando> file just to find out it does not answer my
    Fernando> questions. Since developers must read the document
    Fernando> anyway, they could make a brief summary. Sometimes that
    Fernando> would be enough to decide whether it is worth
    Fernando> downloading the full document or not.  Binary docs can
    Fernando> not be integrated with the rest, they should be
    Fernando> discouraged.  Authors should be encouraged to give out
    Fernando> the document in the original format in which they wrote
    Fernando> it, which seldom is a binary one, except for MS Word.

 So we should package large doc files in separate -doc packages.  Ok.
That way, if I want to know what-in-the-heck-is-sendmail, I can grab
the doc package for it, and read that, without installing the program
itself.  I think that's the Right Way to do things. ---Not just large
`binary' docs, but all docs, *especially for major programs.*

 Maybe it would be good to be able to extract just the conffiles from
a .deb too, for having a look at them prior to installation... a
suggested and documented way.  (how to get started learning Linux)

    Fernando> 1) The default format for online documentation is
    Fernando> HTML. A web browser (lynx) and a very small web server
    Fernando> (boa) will be in the core distribution, marked
    Fernando> important.
	[...]
    Fernando> 5) The man program should be marked optional. When a
    Fernando> user types "man something" and man is not installed,
    Fernando> lynx would automatically be invoked and it would present
    Fernando> the HTML-converted man page the user requested.

 Perhaps the standalone `info' reader could also launch `lynx', in the
similar way as  it can grab regular  man pages now?   (or should.  The
one that's with RedHat 3.0.3 seems to work, my  copy does not.  On the
redhat machine at my ISP, I can type `info man' and get the man man.)

 I suppose what everyone may say is that lynx can hit `dwww', which
will also _display_ info files.  Display is about all though.  It
removes much of the functionality of Info.

 The *best* way to read info and manuals is inside XEmacs. :-) Try it
long enough to really find out what it's capable of, and only then say
something.  It has a web browser inside it too, and works just fine on
a tty.  You don't /have/ to have X to use it. (if it's compiled right.
(AFAIK.)  It cannot be linked against X libs if they aren't
installed.)  Emacs is great as well, for machines without the
resources to run XEmacs, wich needs at the very least 12Mb RAM, under
X-Windows, which needs a minimum of 8Mb to run well.


    Fernando> This is almost as fast as original man but much more
    Fernando> powerful. As it has little overhead, it can replace the
    Fernando> man program. But if a user still thinks he can't stand
    Fernando> the small overhead, man can optionally be installed.
    Fernando> Man depends on groff. That's a huge overhead in both
    Fernando> size and speed.  Nowadays no one writes groff documents
    Fernando> other than man pages. However, groff is needed for
    Fernando> printing man pages, but it is a bloated solution for
    Fernando> online consultation.

 In Info, Emacs or XEmacs, you type {M-x man}, and enter a man page.
It runs the man command, and displays the formatted result.  `man' can
be used like that, as can `lynx'.  You can follow links to other man
pages with the mouse--- they highlight when you fly over them.  (I like
info the best.  W3 complements it quite nicely.)

    Fernando> 6) The info program should be marked optional. When it
    Fernando> is installed, it would compile the texinfo files and
    Fernando> place the output in /usr/info. When it is deinstalled,
    Fernando> it would erase the /usr/info directory. There will be a
    Fernando> hook in dwww to register texinfo pages so that the info
    Fernando> directory is kept always current.

 No!  You cannot just erase that directory if Emacs or XEmacs are
installed.  I suggest looking over info once again, and finding out
more about it's design and contents.  Try `libc-mode' in emacs, and
info lookup in cperl mode.  They are invaluable.

    Fernando> The preferred online way of viewing texinfo files is
    Fernando> through the texinfo to HTML on-the-fly converter. Info
    Fernando> fans who prefer the crappy info interface should still
    Fernando> be able to install info files, but without imposing
    Fernando> them on everyone. The info format is awful. Texinfo is
    Fernando> nicer. Texinfo->HTML is optimal. Emacs fans can use the
    Fernando> w3 mode for viewing texinfo files. Or they can install
    Fernando> info and use the info mode if they want. For other
    Fernando> people, just texinfo is enough.

 You don't know what you're talking about, IMO.  I suggest you read
the help and tutorial for using `info'.  The interface is *not*
"crappy".  It is very powerful.  Sometimes the keys seem a little odd;
they work on any keyboard ever invented though.  It is much more
powerful than lynx or any web browser viewing html.  I'll bet that
info is less resource-intensive as well, being preformatted.

 Preformatted, searchable, hyperlinked tree structure, programmable
(inside emacs), texinfo can be printed or made into info files...
TeXinfo was designed by men who have studied and used computers for a
long time.

 While I'm browsing C or perl source, I can press a few keys, and have
the info manual to a libc or perl function in a second window in under
a second.  (I can put the cursor on a word, and with a few keystrokes,
have W3 fetch its definition from the online dictionary, too.)

    Fernando> 7) A default searching/indexing engine should be
    Fernando> chosen. It would be marked standard, but not
    Fernando> important. Caching would be an option too.

 Yes.  Jim Pick's `dwww' is the best one going.  There's no reason why
info files cannot be indexed also.  With `gnuclient', info URI's could
be opened in an emacs if that is the user's preference.

(Koalatalk someday. ;-) )

-- 
mailto:karlheg+sig@inetarena.com (Karl M. Hegbloom)
http://www.inetarena.com/~karlheg
Portland, OR  USA
Debian GNU 1.3  Linux 2.1.36 AMD K5 PR-133

Reply to: