[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: an idea for next generation APT archive caching



On Fri, Oct 22, 2004 at 02:21:17PM +1000, Jonathan Oxer wrote:
> On Fri, 2004-10-22 at 13:43 +1000, Paul Hampson wrote:
> 
> > Is there anything such a system would want to fetch from a Debian
> > mirror that doesn't show up in Packages.gz or Sources.gz?

> Yes, lots of things as I found out the hard way when I implemented
> object type checking in apt-cacher - even plain old .tar.gz if you want
> people to be able to fetch sources. Not good from a "don't use this as a
> general purpose relay" standpoint! The current checks in apt-cacher look
> like this:

> if ($filename =~ /(\.deb|\.rpm|\.dsc|\.tar\.gz|\.diff\.gz|\.udeb)$/) {

> } elsif ($filename =~ /(Packages.gz|Release|Sources.gz)$/) {
> ...
> } else {
> etc.

.rpm????
To my mind:
Packages.gz refers to the .deb or .udeb
Sources.gz refers to the .dsc, .orig.tar.gz and the .diff.gz
Releases refers to Packages, but is this either neccessary, or
widely used outside the Debian mirrors themselves? (Does apt
even use Releases?)

I'm not even going to think about non-apt uses of this. ^_^
(Although circumvention of any checking is relatively easy... A
Sources.gz that refers to a. .orig.tar.gz which may contain anything
the web site owner wishes.)

And of course, this all is a complete shutout on apt-archives
without Packages.gz. ^_^

Once a file's in the cache tree, then Apache can serve it directly,
and it'll be there until it's cleaned by some other process.

Now that I look in my /var/lib/apt/lists directory, I'm reminded of
the other gain I'd like to see made here... The output from apt-cache
policy shows the host name/IP and then the path under 'dists'. Which
means it can't visually distinguish packages from:
deb http://192.168.0.1:9999/debian sid main
deb http://192.168.0.1:9999/ipv6 sid main
both give: 500 http://192.168.0.1 sid/main Packages
(This is a mock-up... the IPv6 archive has ipv6 as pool, not 'main'.
I can't seem to find an example now, but I'm sure I used to hit one
all the time before. >_<)

Using this as a proxy means the source names don't chance, and so the
hostnames become sensible/usable again. (Even though they're not
neccessarily accurate ^_^).

> (It's trapping the Packages.gz etc files separately because you can't
> just cache them directly: you'd have namespace collisions all over the
> shop. They have to be stored separately in the cache based on the
> requested host, distro etc and then the names mapped back again when
> another request comes in).

Hopefuly _that_ will be avoided by storing in a mirror-structured tree,
rooted at the mirror-source or something.

And for that I'm thinking something like the apt-proxy configuration
where the admin defines a mirror-type, hostnames to recognise, and
mirror sources to talk to.

Also an option for "dynamic mirrors" would be good, for any unrecognised
hostname to effectively autogenerate it's own mirror directory.

Of course, now I might be asking too much of mod_rewrite and/or
mod_proxy. I'll need to do some reading myself to determine if this is
possible in the form I hope for.

-- 
-----------------------------------------------------------
Paul "TBBle" Hampson, MCSE
7th year CompSci/Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson@Anu.edu.au

"No survivors? Then where do the stories come from I wonder?"
-- Capt. Jack Sparrow, "Pirates of the Caribbean"

This email is licensed to the recipient for non-commercial
use, duplication and distribution.
-----------------------------------------------------------

Attachment: signature.asc
Description: Digital signature


Reply to: