[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: apt experimental breaks w/ webcaching (was Re: apt 0.6 in experimental)



On Thu, Jan 01, 2004 at 06:52:53PM -0800, Matt Zimmerman wrote:
> On Thu, Jan 01, 2004 at 07:34:26PM -0500, Anthony DeRobertis wrote:
> > Squid is certainly within its rights to not fetch a new version of  
> > Packages.gz and Sources.gz while fetching the new Release file, and it  
> > seems it happens. Especially with the amount of transparent proxying  
> > that exists in the wild, this could be a problem...
> 
> A cache which serves stale data is a broken cache.  I think that apt
> is within its rights to expect a consistent view from the world.  You
> would see other failures if you got mismatched versions of Release and
> Release.gpg.

Without perfect expiration data from the server, HTTP caches can't
fulfil this criterion, otherwise they always need to contact the server
and therefore can't properly fulfil their purpose as caches. What if
somebody had manually fetched Packages.gz ten minutes before the mirror
sync, but Release was uncached so the cache had to fetch it for the apt
run after the mirror sync?

In fact, from the cache's point of view the files are probably *not*
stale. A quick check on ftp.uk.debian.org, for instance, confirms that
we don't send any Expires or Cache-Control headers (even if we did, they
couldn't be 100% accurate). Therefore the cache's only possible
freshness criteria are based on heuristic expiration guesses, and those
are unlikely to be tight enough to avoid occasional failures.

> This will probably go away, though, when apt 0.6 is fixed to send
> proper IMS requests for Release and Release.gpg.

A request with If-Modified-Since returns 304 Not Modified (and therefore
no message-body) if the entity has not been modified, so you can only
use this for files you already have. The caching problems above may well
be caused by requests from multiple systems, so Cache-Control would be a
more appropriate choice of header.

Of course, Cache-Control requires you to make at least one uncontrolled
request first (or perhaps max-age=0 or no-cache?) in order that you know
the server's date, otherwise you run into clock synchronization
problems. It *is* possible to implement "get me file B and make sure
it's at least as new as file A" as an HTTP client, though, and if you
make sure your first file is genuinely fresh then the problem is solved
unless you're actually in a mirror sync at the time, in which case you
can just try again later.

Cheers,

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Reply to: