Re: fixhrefgz - tool for converting anchors to gzipped files
Thanks Lars for the tool.
I wrote exactly the same thing in Perl (on your request!) some time ago. I
have attached it to this mail.
I don't know which version is better. It looks like Lars' implementation
has hard coded a lot of HTML tags for processing. Mine is based on Perl's
HTML::Parser class and is thus independent of any specific HTML tags.
Thanks,
Chris
-- Christian Schwarz
schwarz@monet.m.isar.de, schwarz@schwarz-online.com,
Debian is looking schwarz@debian.org, schwarz@mathematik.tu-muenchen.de
for a logo! Have a
look at our drafts PGP-fp: 8F 61 EB 6D CF 23 CA D7 34 05 14 5C C8 DC 22 BA
at http://fatman.mathematik.tu-muenchen.de/~schwarz/debian-logo/
#!/usr/bin/perl
#
# fixhtmlgz 0.2
# Copyright (c) 1997 by Christian Schwarz <schwarz@monet.m.isar.de>
# May by distributed under GPL 2.
#
# Specification:
#
# Currently, we have a problem with compressed HTML: we can access
# compressed HTML fine, but links don't work very well. The problem
# is that the link says "foo.html", and the actual file is
# "foo.html.gz",
# and the browsers and servers aren't intelligent enough to handle
# this invisibly. This means that we can't install compressed HTML, if
# it contains links.
#
# We need a program that can be run on uncompressed HTML, which converts
# local links to the compressed versions of the files. Usage would
# be something like:
#
# fixhtmlgz file.html ...
#
# - read file.html
# - for each link <a href="foo.html">, if foo.html exists,
# convert the link to foo.html.gz instead
# - otherwise, do not modify the link
# - output is either to file.html.fixed or file.html (replace
# original with modified version)
#
# Changes:
# v0.2:
# - now handles gzipped files
# - parse .html and .htm files
# - changed replacing rule: change href to refer to the
# file, as it actually exists. Example:
# <a href="foo.html"> will only be converted to
# foo.html.gz, if this file exists, and not if
# foo.html exists.
#
package Parser; #-------------------------------
require HTML::Parser;
@ISA = qw(HTML::Parser);
sub declaration {
my ($self, $decl) = @_;
print ::OUT "<!$decl>";
}
sub start {
my ($self, $tag, $attr, $attrseq, $origtext) = @_;
if ($tag eq 'a') {
if ($href = $$attr{'href'}) {
if (!($href =~ s/^(\S+:)//o) or ($1 =~ /file:/i)) {
$type = $1;
$href =~ s/(\#.*)$//o;
$anchor = $1;
#print "href: ($type,$href,$anchor)\n";
if (($href =~ /\.html$/) and -f $href) {
# append `.gz'
$$attr{'href'} = "$type$href.gz$anchor";
# rebuild origtext.
$origtext = "<a";
for $tag (@$attrseq) {
if ($$attr{$tag}) {
$origtext .= " $tag=\"$$attr{$tag}\"";
} else {
$origtext .= " $tag";
}
}
$origtext .= ">";
}
}
}
}
pass:
print ::OUT "$origtext";
}
sub end {
my ($self, $tag) = @_;
print ::OUT "</$tag>";
}
sub text {
my ($self, $text) = @_;
print ::OUT "$text";
}
sub comment {
my ($self, $comment) = @_;
print ::OUT "<!--$comment-->";
}
#########################################################################
package main;
if ($#ARGV == -1) {
print "usage: fixhtmlgz <html file> ...\n";
exit 1;
}
$p = Parser->new;
while ($filename = shift) {
if ( ! -f $filename ) {
print "error: file $filename not found, skipping.\n";
next;
}
$output = "$filename.fixed";
open(OUT,">$output") or die "cannot open output file $output: $!";
$p->parse_file($filename);
close(OUT);
rename($filename,"$filename.bak") or die "cannot rename $filename: $!";
rename($output,$filename) or die "cannot rename $output: $!";
}
exit 0;
Reply to: