[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: apt-get update hashsum mismatch prevent



Hi!

It's great to see this discussion restarted. (Please Cc me in replies.)

> (1) apt fetches the InRelease file and if during that fetching the
> server updates its indexfiles the subsequent GET for the indexfiles
> will fail with a hashsum mismatch because the InRelease file has the
> hashes of the previous generation of the indexfiles.
> 
> (2) apt fetches a new InRelease file but the new indexfiles are not
> updated/mirrored yet. A hashsum mismatch error is found because the
> new InRelease file hashes do not match the old indexfiles.
> 
> 
> Problem (2) is of less relevance right now because AIUI our mirror
> scripts updates in multiple steps, i.e. sync pool, sync indexes, sync
> release. But its still worth thinking if this could be simplified.

Please don't assume mirrors actually update in that fashion and please don't 
discount (2) as a problem. Working to simplify the update (which is code we do 
not control) and instead embody some more complexity in apt and dak (which is 
code we do control) is definitely worth it. My experience is that there are 
relatively few mirrors that actually do this update right; Raphael may have 
some hard stats on this from http.d.n.

Worse, is when it is an intermediate mirror in the chain updates while its 
peer is updating. Not so long ago, ftp.au.d.o was updating (through cron, it 
looked) from a kernel.org mirror and each and every time it updated for a 
period of a week or so, it managed to hit the exact race condition described  
above. Since almost all other mirrors in Australia were updating from 
ftp.au.d.o, that meant that just about every mirror in the region was 
unusable. (I frequently find that the nearest mirrors with consistent metadata 
are Taiwan, India, New Caledonia or Vanuatu.)

Over the last couple of years, I've spent a lot of time contact mirror admins 
trying to improve the state of our mirrors. Where the problem is just a cron 
job just needed kicking, that's been successful. In terms of solving 
structural problems with mirror updates rather than temporary problems of not 
updating, I've had precisely zero success.

In talking to mirror admins, the feedback I have received is (paraphrasing):

* Why is Debian a special snowflake? We mirror everything else just using a 
single rsync pass.

* You want me to run some script delivered over ftp? Yay, there's a gpg 
signature from some guy I've never heard of. Are you going to pay me for my 
time to audit hundreds of lines of shell script before I deploy it? Why isn't 
it a single rsync command again?

* I'm already using ftpsync. What... you updated it again? *sigh* I've got to 
go through and recheck everything with it and get managerial approval to run 
it again...

* You want me to give some random Debian person I've never heard of an ssh 
login so you can do push mirroring? No chance, not even a key-only command-
restricted login. (include various reasons about ceding control of resources, 
blocked inbound connections, institutional policies, etc)

Quoting from a private mail from a mirror admin: «I’d dearly love the Debian 
project to actually fix their stuff, as the other 100+ archives we have mostly 
have managed. Fedora addressed the metadata problem you mentioned in the 
linked blog post years ago.... In fact, even the ftpsync scripts don’t 
properly address the problem.»

Perhaps Packages-$SHASUM (along with Sources-*, Translations-*, Contents-*) 
files to protect the metadata should be on the menu for fixing this properly. 
The Release file contains the hashes for both new and old metadata files and 
both new and old metadata files exist on the mirror. Yes, that doubles the 
amount of metadata on the mirrors (although we could probably avoid this dance 
for stable releases or only include the 'old' hashes for a day or two after a 
point release then get rid of them).


Even if we don't have the appetite for creating a set of Packages-$SHASUM files 
(and Sources-* and Translations-*) to protect the metadata from this update as 
fedora has done, there are a couple of quick wins we could consider:

* I suspect that just moving installer-* directories out of dists/ would go a 
very long way towards reducing the time over which these problems occur by 
making the rsync of dists/ a much smaller data set. (like cdimage, is there 
any need for them to be on the mirror network at all?)

* Removing Translation-* from Release would also help this (which was what 
Ubuntu had done last time I looked at a Release file from them). Once again, 
this has the effect of reducing the size of the material in dists/ that needs 
to be synced 'atomically'. (Ship a separate set of signed hashes for 
Translations if we think anyone actually cares about signed Translations?)

* I'd love to be able to tell apt to get metadata from one mirror (slow but 
reliable) and packages from a different one (inconsistent metadata but has a 
fat pipe and is in the next building). I don't know if doing that in apt or in 
a proxy would be best.


So... if we're doing work to improve our mirroring, then it would be great if 
we could really address this problem. Coincidentally, unique names for content 
(such as Packages-$SHASUM) would also help any efforts to distribute the 
archive via a CDN such as cloudfront.debian.net.

cheers
Stuart

-- 
Stuart Prescott    http://www.nanonanonano.net/   stuart@nanonanonano.net
Debian Developer   http://www.debian.org/         stuart@debian.org
GPG fingerprint    90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7


Reply to: