WARC-file Incompatibility of the Debian web sites
WARC-files have their origins at the Internet Archive
and they are essentially a persistent hash-table in the form of
key --- <URL the way it is in the Wild-Wild-Web>
value --- <thefile>
http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml
the issue with the current Debian sites seems to be
that tools like the
https://github.com/ludios/grab-site
create files like (~30MiB)
http://temporary.softf1.com/2017/bugs/www.debian.org-devel-2016-12-28-ec5f8b13-00000.warc.gz
that fail to be viewed with a tool like the
https://github.com/alard/warc-proxy
With the exception of large files
https://github.com/alard/warc-proxy/issues/5
the warc-proxy actually works fine and the WARC
cration and viewing tools that I use can be downloaded from
(~9MiB)
http://archive.softf1.com/2016/software/2016_12_xx_WARC_tools.tar.xz
however, some sites, including the Debian web sites,
fail to be "WARC-able". It would be nice, if it were fixed,
specially given the fact that one never knows, when
something becomes censored. Please keep in mind that
there is no limit at the absurdity of censorship.
At some day photos of pigeons might be banned, because
may be some religious sect or political party finds
them offensive or otherwise endangering their ability
to keep the dumb ones working as slaves for them, paying taxes, etc.
The warc-proxy works fine with files that have a size of ~200MiB,
meaning, the aforementioned
http://temporary.softf1.com/2017/bugs/www.debian.org-devel-2016-12-28-ec5f8b13-00000.warc.gz
is not "too big".
Regards,
Martin.Vahi@softf1.com
Reply to: