[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

`doczipper' compression of documentation...



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


 I've written a perl script and some apache `mod_rewrite' (access.conf
 paste-in) code that will allow serving of gzipped html files from the
 documentation directory.  The `doczipper' program will compress them
 and fix up the `dpkg' package .list files and symlinks in the
 filesystem made to the original file name.  There's a simple shell
 script that acts as a serverlet when `%{HTTP:Accept-Encoding} !~
 /gzip/'; that is, if the client does not support gzip content
 encoding, the server will know that and zcat the document as it
 serves it.  When the package containing the now-compressed
 documentation is later upgraded, the compressed docs will be removed
 properly, since the .list database is updated by `doczipper'.  !!
 don't push C-c !!  How can I guard against the half-updated .list
 problem?  Hmmm... wheel in a wheel book... filesystems... I'll find
 it and work out a way. 8-{)>> Those files *are* fairly critical,
 after all...  for that reason along, I would like a *thourough* peer
 code review prior to first upload.

 I've begun writing a manual for it in sgml also.

 `doczipper' utilizes the `dpkg' lock file (and perhaps I will try and
 remember to put some code into it to make it only run one instance of
 itself at a time).  I think that this is just the sort of thing to
 hang on a `dpkg' or `apt' "after-install-hook".  I'd like it if
 `dpkg' supported hook scripts, the way the emacsen support hook list
 variables.  That would help maximize its extensibility.  A library of
 perl module code could be developed / distilled for use by
 `dpkg-hooks', I surmise.

 Because it utilizes URL rewriting in the server, there is no need to
 fix up internal references in the HTML being compressed.  The client
 goes looking for "foo.html", which no longer exists since it's been
 compressed into "foo.html.gz".  The rewrite rules find the right one,
 tranparently to the client, and send it off for the viewer.  I don't
 know what other web servers we've packaged support gzip content
 encoding and any form of URL rewriting.  I will make the package
 depend on Apache for the time being.  The sort of rewrite that has to
 be done for this would be simple to code up for other (possibly more
 lightweight) servers, I imagine.  It can use the filename extension
 to know it needs to send a content encoding header, and the URL
 rewriting is ultra simple.  If "foo.html" is found, serve it.  If
 not, try "foo.html.gz", and serve that if present.  You could write
 in something like:

   /*-*- c -*-*/
   char filename[ PATH_MAX + 1 ];
   int len;
   struct stat statbuf;

   strncpy (filename, "foo.html.gz", PATH_MAX);
   len = strlen (filename);
   filename[ len - 3 ] = '\0';
   status = stat (filename, &statbuf);
   if (status == 0)
     return (serve (filename));
   filename[ len - 3 ] = '.';
   status = stat (filename, &statbuf);
   if (status == 0)
     return (serve (filename));
   return (errorwhatever());

 It will take me a few more days of work and testing to get it nailed
 down and glue-dried well enough to let others try it.  It's in my CVS
 here if anyone wants a look; email me for the access info.  It worked
 six or eight months ago when I first wrote it, but I've redesigned
 some, having learned a little more about using `mod_rewrite', &c.  It
 won't run today, but should soon.


 Any ideas on this?  Keywords I could use for an archive search for
 relavant material?  Which lists?  What else should I read regarding
 this sort of thing that might be more knowledge dense than email?


 Karl M. Hegbloom <karlheg@debian.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.5 and Gnu Privacy Guard <http://www.gnupg.org/>

iD8DBQE5DljjtIWZTvHManoRAjrrAJ9smQhpiK6DmV7R1hcgVEX++9r9+ACdGUBW
X2vuquNsyfa67Cjo+Qb33ZA=
=D934
-----END PGP SIGNATURE-----


Reply to: