On Wednesday 19 May 2010 07:49:51 Kim Holviala wrote: > On 2010-05-18 23:01, Florian Teply wrote: > > I myself started (and am still somewhere midway i guess) to clean out the > > archive. As i went through the archive, i've noticed quite some empty > > files, most notably .gophermap an robots.txt which had the size of zero > > Bytes. > > I noticed the very same thing, so the first thing I did was: > > # find archive/ -type f -size 0 -exec rm -v "{}" \; > > Followed by: > > # find archive/ -type d -exec rm -v "{}" \; > > In case you're wondering, the second removes emtpy directories only :-). > I basically did the same thing just a little more complicated as i used the output of find in a for loop. Had to do it several times though, as an otherwise empty directory which has an empty directory in it doesn't count as empty. > > the way i also stumbled across some other files that are in my opinion > > bogus: http GET requests that returned a 404 error. > > There are also loads of gophermaps which only contain "3Error of some > kind". > Yeah, saw those too. > > gopher i removed all of those. Next thing i'll try to do is restoring the > > gophermap files where available to point to the mirror instead of the > > original server. Could use some help there though. Any hints on that one, > > Kim maybe? > > I was lazy... So I'm doing it one site at the time - one server hosts > one archived site using Gophernicus with some extra command line > switches, and a second site spiders the "live site" using the hosts-file > to fix the IP address.... > Maybe we should coordinate that a little, as i see no point in two guys doing the same grunt work twice in total... Florian
Attachment:
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Gopher-Project mailing list Gopher-Project@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/gopher-project