[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: an idea for next generation APT archive caching



On Thu, 2004-10-21 at 13:04 +1000, Brian May wrote:

> * No thought put into the file deletion algorithm. IMHO, deleting
> files based on age is wrong (consider how long stable files
> last). Deleting files based on number of different copies is also
> wrong (consider if you have some systems setup with stable and another
> is unstable). IMHO, the only correct way is to scan the most recently
> downloaded Packages and Source index files and delete files that
> aren't mentioned anymore.

That's how apt-cacher does it. Early versions of apt-cacher did no cache
cleaning and it was the #1 requested feature for a while, but once I sat
down to actually start implementing it I discovered something that's not
obvious until you actually try to do it yourself: Writing Cache Expiry
Algorithms Is Bloody Hard(TM).

In the end I settled on a combination: Packages and Release files are
expired based on age, and .debs are purged based on reference within a
Packages file. However, that's not a 100% solution either because what
happens if several days go by without any clients doing an 'apt-get
update'? The Packages file is purged by the cache cleaning script
because it's too old, but then all the .debs are purged too because
there's no matching Packages file! Doh.

So it's necessary to keep fetching the Packages files within their
expiry time or the cache gets nuked.

> If you want a reliable caching service, I think some thought needs to
> be put into some of the issues above. Some issues might be easy to
> fix, others might be harder (e.g. minimizing latency so the client
> doesn't time out and to minimize download time but choosing the best
> server at the same time).

I haven't looked at it them for this purpose in detail but I still think
p2p systems are a natural for this. Layering .deb package retrieval onto
the Torrent or similar would rock. I'm sure others know much more about
the issues though.

> You mean via HTTP? This would be possible to add, I think. I guess it
> hasn't been considered a priority.

Not necessarily, it depends on the cache architecture. Trying to do this
with apt-cacher, for example, would suck mightily because it uses a flat
cache structure. What's really needed to make is trivially browseable is
a cache that stores objects in a structure that mimics the original
mirror structure. My understanding is that apt-proxy v2 was written with
this in mind, but as usual I'm probably wrong.

Cheers  :-)

Jonathan Oxer
--
The Debian Universe: Installing, managing and using Debian GNU/Linux
http://www.debianuniverse.com/



Reply to: