Hi,
tl;dr: snapshots.debian.org reproducibly truncates around 1-2% of the
downloads. While I think that's unusually high, it turns into a real
problem in combination with "lazy" clients (or proxies) that do not
rigorously check their response.
Longer description: I wrote a test script that tries a lot of HTTP GET
requests on random resources from snapshot.debian.org, collecting
statistics on the responses. Independent of the origin used [0], this
reveals the following patterns:
98% - 99.5% of all requests return a good status code and the
downloaded body matches in size and by sha1 hash.
0.5% - 1.5% of responses are truncated, i.e. the connection closes
before all data is received
< 0.5% other errors like 403, 503 or hash mismatches
Interestingly, for immediate retries, the distribution changes a lot for
the second attempt:
50% - 76% successes
17.6% - 50% truncated responses
up to ca. 5% other errors
Further retries show a roughly similar success rate, so that eventually,
after enough retries, the client gets the expected response.
(With deferred retries, giving varnish a chance to populate the cache,
the second attempt stats tend a lot towards 100% success, therefore I
conclude that the truncation failures are related to cache misses.)
Manual inspection of the HTTP response headers from snapshot.debian.org
reveals, that varnish is in use (and that somebody there is a Terry
Pratchett fan). It seems varnish delivers cached resources with the
usual "Content-Length" field in its headers, but uses
"Transfer-Encoding: chunked" for cache misses. (Which sounds like an
entirely reasonable thing.)
But back to the facts: with an additional apt-cache-ng proxy in between,
the success rate for follow-up requests drops to nearly 0%. The result
distribution for the second attempt then looks more like:
97.4% hash mismatch
1.2% successes
1.2% error 503
For further attempts, these tend towards 100% hash mismatches, because
apt-cache-ng stores an invalid (truncated) copy in its cache.
I tested chunk transfer encoding of apt-cacher-ng a bit, but since
version 0.8.0_pre2-1 of apt-cacher-ng, it seems handle chunking just
fine [1]. And I tried to reproduce the error on the varnish side, but
without success so far.
So I'm not entirely sure where exactly the problem lies, but I am
currently suspecting varnish because of two rather vague observations:
a) on traffic intercepted between a apt-cacher-ng and varnish, I saw
an incorrect resource length advertized by varnish (only once, by
manual interception) [2]. I intend to try to reproduce this
automatically to test the varnish side.
b) a varnish issue that sounds related and got fixed just a couple
days ago:
https://github.com/varnishcache/varnish-cache/issues/2035
I very much doubt this fix has already made it to the snapshots
infrastructure. It certainly didn't arrive in Debian, yet.
I cannot proof either of these is really related or the cause of the
hash mismatches for the combination with varnish with apt-cacher-ng.
Before continuing to investigate, I simply wanted to share my findings.
I'd welcome feedback as well as hints about the infrastructure, that
might be relevant. I intend to continue analyzing varnish and
apt-cacher-ng to be able to eventually solve this issue.
Kind Regards
Markus Wanner
[0]: I tried from 5 different sources with highly varying bandwidth.
Most of those are located in Europe, though. And during European daytime.
[1]: Chunked transfer encoding seems to work since this commit:
commit eebb3c93fd63ceb479b7100f71861069f5b16914
Author: Eduard Bloch <blade@debian.org>
Date: Sat Aug 30 20:33:49 2014 -0700
Fix for premature abort of chunked transfer
[2]: The intercepted request on a file weighting 474 KiB:
GET
/archive/debian/20140818T083621Z/pool/main/libg/libgnomeui/libgnomeui-dev_2.24.5-2_kfreebsd-amd64.deb
HTTP/1.1
User-Agent: Debian Apt-Cacher-NG/0.8.0
Host: snapshot.debian.org
Range: bytes=122879-
Accept: */*
Accept-Encoding: identity
Connection: keep-alive
HTTP/1.1 206 Partial Content
Date: Thu, 08 Dec 2016 14:46:21 GMT
Server: Apache
Expires: Sun, 18 Dec 2016 14:46:21 GMT
Cache-Control: public, max-age=864000
ETag: "1aef9a4d688e1e1a0bd09427b894e47fd37413dc"
Last-Modified: Mon, 05 Sep 2011 15:23:40 GMT
X-Clacks-Overhead: GNU Terry Pratchett
Content-Type: application/x-debian-package
X-Varnish: 102013332
Age: 0
Via: 1.1 varnish-v4
Transfer-Encoding: chunked
Connection: keep-alive
Accept-Ranges: bytes
Content-Range: bytes 122879-122879/122880
001
.
0
i.e. varnish answered with a chunk of one byte (which may or may not be
correct) but a known incorrect value for the complete-range. I certainly
wouldn't blame apt-cacher-ng for truncation of the file in that case.
Attachment:
signature.asc
Description: OpenPGP digital signature