Re: apt experimental breaks w/ webcaching (was Re: apt 0.6 in experimental)
On Fri, Jan 02, 2004 at 09:11:31AM -0800, Matt Zimmerman wrote:
> On Fri, Jan 02, 2004 at 03:44:02AM +0000, Colin Watson wrote:
> > On Thu, Jan 01, 2004 at 06:52:53PM -0800, Matt Zimmerman wrote:
> > > A cache which serves stale data is a broken cache. I think that apt
> > > is within its rights to expect a consistent view from the world. You
> > > would see other failures if you got mismatched versions of Release and
> > > Release.gpg.
> > Without perfect expiration data from the server, HTTP caches can't
> > fulfil this criterion, otherwise they always need to contact the server
> > and therefore can't properly fulfil their purpose as caches. What if
> > somebody had manually fetched Packages.gz ten minutes before the mirror
> > sync, but Release was uncached so the cache had to fetch it for the apt
> > run after the mirror sync?
> Their only purpose as caches is to improve performance; there's no rule that
> they aren't allowed to contact the server to ensure that their response is
> fresh enough.
True, but I quote (RFC 2616 section 13):
Caching would be useless if it did not significantly improve
performance. The goal of caching in HTTP/1.1 is to eliminate the need
to send requests in many cases, and to eliminate the need to send
full responses in many other cases. The former reduces the number of
network round-trips required for many operations; we use an
"expiration" mechanism for this purpose (see section 13.2). The
latter reduces network bandwidth requirements; we use a "validation"
mechanism for this purpose (see section 13.3).
You're describing the second part of this, but the first is also
important to the performance of caches. Round trips to the server often
dominate the response time. Release is a good example, since it's small,
but Packages.gz might also be small enough for this to be relevant in
the case of repositories other than Debian main.
> > In fact, from the cache's point of view the files are probably *not*
> > stale. A quick check on ftp.uk.debian.org, for instance, confirms that we
> > don't send any Expires or Cache-Control headers (even if we did, they
> > couldn't be 100% accurate). Therefore the cache's only possible freshness
> > criteria are based on heuristic expiration guesses, and those are unlikely
> > to be tight enough to avoid occasional failures.
> Since Release and Packages/Sources are generated at close to the same time,
> they should always be about the same age, and the heuristics should apply to
> them equally.
They will be the same age on the *server*, but not necessarily in the
cache, as my example quoted up at the top of this message shows. The
cachedness or otherwise of an entity is very likely to be an input to
the caching heuristics.
Colin Watson [firstname.lastname@example.org]