[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: apt experimental breaks w/ webcaching (was Re: apt 0.6 in experimental)

On Fri, Jan 02, 2004 at 03:44:02AM +0000, Colin Watson wrote:

> On Thu, Jan 01, 2004 at 06:52:53PM -0800, Matt Zimmerman wrote:
> > A cache which serves stale data is a broken cache.  I think that apt
> > is within its rights to expect a consistent view from the world.  You
> > would see other failures if you got mismatched versions of Release and
> > Release.gpg.
> Without perfect expiration data from the server, HTTP caches can't
> fulfil this criterion, otherwise they always need to contact the server
> and therefore can't properly fulfil their purpose as caches. What if
> somebody had manually fetched Packages.gz ten minutes before the mirror
> sync, but Release was uncached so the cache had to fetch it for the apt
> run after the mirror sync?

Their only purpose as caches is to improve performance; there's no rule that
they aren't allowed to contact the server to ensure that their response is
fresh enough.

> In fact, from the cache's point of view the files are probably *not*
> stale. A quick check on ftp.uk.debian.org, for instance, confirms that we
> don't send any Expires or Cache-Control headers (even if we did, they
> couldn't be 100% accurate). Therefore the cache's only possible freshness
> criteria are based on heuristic expiration guesses, and those are unlikely
> to be tight enough to avoid occasional failures.

Since Release and Packages/Sources are generated at close to the same time,
they should always be about the same age, and the heuristics should apply to
them equally.  Inconsistent Packages/Sources/Release really ought to be a
rather infrequent case.

> A request with If-Modified-Since returns 304 Not Modified (and therefore
> no message-body) if the entity has not been modified, so you can only use
> this for files you already have. The caching problems above may well be
> caused by requests from multiple systems, so Cache-Control would be a more

APT almost always has old copies of the indexes that it needs, so IMS is
very efficient.  You are correct that it cannot guarantee consistency, but I
believe that it would have prevented the problem in this case.

> Of course, Cache-Control requires you to make at least one uncontrolled
> request first (or perhaps max-age=0 or no-cache?) in order that you know
> the server's date, otherwise you run into clock synchronization
> problems. It *is* possible to implement "get me file B and make sure
> it's at least as new as file A" as an HTTP client, though, and if you
> make sure your first file is genuinely fresh then the problem is solved
> unless you're actually in a mirror sync at the time, in which case you
> can just try again later.

Hmm, APT's http method already sends Cache-Control: max-age for Packages and
Sources, just not for Release.  I can fix that.

 - mdz

Reply to: