[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Squid: list of currently cached objects?



On Sun, 11 May 1997, J.H.M. Dassen wrote:

> How can I get a list of the URLs of the objects that squid has currently 
> cached?

awk '{print $6}' </var/spool/squid/log

The 'log' file format depends on the squid version. This is for squid
1.1.x - if you're still using the old squid 1.0.x you'll have to look at
the file to figure out which field to print with awk.

> Having such a list would allow me to use 'wget' to refresh the cache; this
> would be useful for my laptop system, which is not alway on the net.

    #! /bin/sh
    proxy=some.host
    port=3128

    http_proxy=http://$proxy:$port/
    ftp_proxy=http://$proxy:$port/
    gopher_proxy=http://$proxy:$port/

    awk '{print $6}' </var/spool/squid/log | \
      wget -q -nh -i /dev/stdin -O /dev/null

This is untested but it should work.  If wget doesn't like working with
/dev/stdin then you'll have to redirect the output of awk to a temporary
file (e.g. "tmpfile=/tmp/wget.$$") and use that instead.

The -q is for "quiet", the -nh is to disable DNS lookups of hostnames
(let squid do that as required). The "-O /dev/null" should make wget
just dump everything it fetches into the bit-bucket.


If you wanted to exclude certain URLs then you could insert a 'grep -v
<regexp> | \' line in between the awk and the wget.

e.g.

    $exclude="foo.com\|bar.org\|ftp://\|gopher://";

    awk '{print $6}' </var/spool/squid/log | \
      grep -v "$exclude" \|
        wget -q -nh -i /dev/stdin -O /dev/null

excludes all ftp & gopher URLs, as well as everything from domains
foo.com and bar.org




I also have a sample perl script posted by Duane Wessels (squid author)
on the squid-user list for converting the log file into pathnames (this
only works if you have a single cache_dir):


     #!/usr/bin/perl
     $L1= 16;   # Level 1 directories
     $L2= 256;  # Level 2 directories

     while (<>) {
       $f= hex($_);
       $path= sprintf("%02X/%02X/%08X", $f % $L1, ($f / $L1) % $L2, $f);
       print $path ;
     }

(modified slightly from Duane's original to suit my purposes)

Converts log lines like:

00006075 3373d9ac fffffffe 33054581      1667 http://foo.com/path/file.html

into lines like:

    05/07/00006075

which are pathnames relative to the cache_dir (/var/spool/squid by
default on debian systems)


You can use this to extract information about URLs from the cache - the
first few lines (usually approx 6 or 8) of each cached file contain
"header" information about the URL for squid's use. e.g.

    $ head -6 /var/spool/squid/00/00/00007001 
    HTTP/1.0 200 OK
    Server: Netscape-Commerce/1.12
    Date: Tuesday, 29-Apr-97 11:45:24 GMT
    Last-modified: Friday, 28-Mar-97 01:11:23 GMT
    Content-length: 656
    Content-type: image/gif

"head -6" is inadequate - sometimes there are more than 6 headers. I
don't think there is ever less than 6. Unfortunately, the 'header'
program which comes with deliver doesn't work on these files (probably
because the "HTTP/1.0 ....." first line doesn't have a : in it)

have fun!

craig

--
craig sanders
networking consultant                  Available for casual or contract
temporary autonomous zone              system administration tasks.


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-user-request@lists.debian.org . 
Trouble?  e-mail to templin@bucknell.edu .


Reply to: