`doczipper' compression of documentation...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I've written a perl script and some apache `mod_rewrite' (access.conf
paste-in) code that will allow serving of gzipped html files from the
documentation directory. The `doczipper' program will compress them
and fix up the `dpkg' package .list files and symlinks in the
filesystem made to the original file name. There's a simple shell
script that acts as a serverlet when `%{HTTP:Accept-Encoding} !~
/gzip/'; that is, if the client does not support gzip content
encoding, the server will know that and zcat the document as it
serves it. When the package containing the now-compressed
documentation is later upgraded, the compressed docs will be removed
properly, since the .list database is updated by `doczipper'. !!
don't push C-c !! How can I guard against the half-updated .list
problem? Hmmm... wheel in a wheel book... filesystems... I'll find
it and work out a way. 8-{)>> Those files *are* fairly critical,
after all... for that reason along, I would like a *thourough* peer
code review prior to first upload.
I've begun writing a manual for it in sgml also.
`doczipper' utilizes the `dpkg' lock file (and perhaps I will try and
remember to put some code into it to make it only run one instance of
itself at a time). I think that this is just the sort of thing to
hang on a `dpkg' or `apt' "after-install-hook". I'd like it if
`dpkg' supported hook scripts, the way the emacsen support hook list
variables. That would help maximize its extensibility. A library of
perl module code could be developed / distilled for use by
`dpkg-hooks', I surmise.
Because it utilizes URL rewriting in the server, there is no need to
fix up internal references in the HTML being compressed. The client
goes looking for "foo.html", which no longer exists since it's been
compressed into "foo.html.gz". The rewrite rules find the right one,
tranparently to the client, and send it off for the viewer. I don't
know what other web servers we've packaged support gzip content
encoding and any form of URL rewriting. I will make the package
depend on Apache for the time being. The sort of rewrite that has to
be done for this would be simple to code up for other (possibly more
lightweight) servers, I imagine. It can use the filename extension
to know it needs to send a content encoding header, and the URL
rewriting is ultra simple. If "foo.html" is found, serve it. If
not, try "foo.html.gz", and serve that if present. You could write
in something like:
/*-*- c -*-*/
char filename[ PATH_MAX + 1 ];
int len;
struct stat statbuf;
strncpy (filename, "foo.html.gz", PATH_MAX);
len = strlen (filename);
filename[ len - 3 ] = '\0';
status = stat (filename, &statbuf);
if (status == 0)
return (serve (filename));
filename[ len - 3 ] = '.';
status = stat (filename, &statbuf);
if (status == 0)
return (serve (filename));
return (errorwhatever());
It will take me a few more days of work and testing to get it nailed
down and glue-dried well enough to let others try it. It's in my CVS
here if anyone wants a look; email me for the access info. It worked
six or eight months ago when I first wrote it, but I've redesigned
some, having learned a little more about using `mod_rewrite', &c. It
won't run today, but should soon.
Any ideas on this? Keywords I could use for an archive search for
relavant material? Which lists? What else should I read regarding
this sort of thing that might be more knowledge dense than email?
Karl M. Hegbloom <karlheg@debian.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.5 and Gnu Privacy Guard <http://www.gnupg.org/>
iD8DBQE5DljjtIWZTvHManoRAjrrAJ9smQhpiK6DmV7R1hcgVEX++9r9+ACdGUBW
X2vuquNsyfa67Cjo+Qb33ZA=
=D934
-----END PGP SIGNATURE-----
Reply to: