Re: Squid: list of currently cached objects?
On Sun, 11 May 1997, J.H.M. Dassen wrote:
> How can I get a list of the URLs of the objects that squid has currently
> cached?
awk '{print $6}' </var/spool/squid/log
The 'log' file format depends on the squid version. This is for squid
1.1.x - if you're still using the old squid 1.0.x you'll have to look at
the file to figure out which field to print with awk.
> Having such a list would allow me to use 'wget' to refresh the cache; this
> would be useful for my laptop system, which is not alway on the net.
#! /bin/sh
proxy=some.host
port=3128
http_proxy=http://$proxy:$port/
ftp_proxy=http://$proxy:$port/
gopher_proxy=http://$proxy:$port/
awk '{print $6}' </var/spool/squid/log | \
wget -q -nh -i /dev/stdin -O /dev/null
This is untested but it should work. If wget doesn't like working with
/dev/stdin then you'll have to redirect the output of awk to a temporary
file (e.g. "tmpfile=/tmp/wget.$$") and use that instead.
The -q is for "quiet", the -nh is to disable DNS lookups of hostnames
(let squid do that as required). The "-O /dev/null" should make wget
just dump everything it fetches into the bit-bucket.
If you wanted to exclude certain URLs then you could insert a 'grep -v
<regexp> | \' line in between the awk and the wget.
e.g.
$exclude="foo.com\|bar.org\|ftp://\|gopher://"
awk '{print $6}' </var/spool/squid/log | \
grep -v "$exclude" \|
wget -q -nh -i /dev/stdin -O /dev/null
excludes all ftp & gopher URLs, as well as everything from domains
foo.com and bar.org
I also have a sample perl script posted by Duane Wessels (squid author)
on the squid-user list for converting the log file into pathnames (this
only works if you have a single cache_dir):
#!/usr/bin/perl
$L1= 16; # Level 1 directories
$L2= 256; # Level 2 directories
while (<>) {
$f= hex($_);
$path= sprintf("%02X/%02X/%08X", $f % $L1, ($f / $L1) % $L2, $f);
print $path ;
}
(modified slightly from Duane's original to suit my purposes)
Converts log lines like:
00006075 3373d9ac fffffffe 33054581 1667 http://foo.com/path/file.html
into lines like:
05/07/00006075
which are pathnames relative to the cache_dir (/var/spool/squid by
default on debian systems)
You can use this to extract information about URLs from the cache - the
first few lines (usually approx 6 or 8) of each cached file contain
"header" information about the URL for squid's use. e.g.
$ head -6 /var/spool/squid/00/00/00007001
HTTP/1.0 200 OK
Server: Netscape-Commerce/1.12
Date: Tuesday, 29-Apr-97 11:45:24 GMT
Last-modified: Friday, 28-Mar-97 01:11:23 GMT
Content-length: 656
Content-type: image/gif
"head -6" is inadequate - sometimes there are more than 6 headers. I
don't think there is ever less than 6. Unfortunately, the 'header'
program which comes with deliver doesn't work on these files (probably
because the "HTTP/1.0 ....." first line doesn't have a : in it)
have fun!
craig
--
craig sanders
networking consultant Available for casual or contract
temporary autonomous zone system administration tasks.
--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-user-request@lists.debian.org .
Trouble? e-mail to templin@bucknell.edu .
Reply to: