[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hashsum mismatch prevention strategies

On Fri, May 11, 2012 at 9:30 AM, Bernhard R. Link <brlink@debian.org> wrote:
> * David Kalnischkies <kalnischkies@gmail.com> [120511 03:03]:
>> Option A is that each mirror (if it chooses to do it) builds a big "index" of
>> hashsum-named hardlinks to the "old" location of the file. Given a
>> repository like this:
>> unstable/InRelease
>> unstable/main/binary-amd64/Packages
>> unstable/main/binary-i386/Packages
>> this would mean that we e.g. have a
>> unstable/by-hash/sha256/bbbbb… -> unstable/main/binary-amd64/Packages
>> unstable/by-hash/sha256/ccccc… -> unstable/main/binary-i386/Packages
>> (Imagine this being done for e.g. md5 and sha1 hashes, too)
>> A client like apt would then request the InRelease file as usual and then take
>> the hashes it can extract out of it to request the other files it needs.
> If this is implemented, please add some field in InRelease to denote
> those files are there and do not make any program look for those files
> unless that flag is found. (In the past apt-get often got support for
> things not yet in the Release files thus guessing there availability
> by trying to download them unconditionally. I think that is a mistake
> that should not be repeated).

As mentioned later in my mail this need for fallback is my biggest concern
here, too. From a client side of view i would love to come to a point there
i don't need to guess and fallback for files anymore and thankfully minus
all the compatibility fallbacks we still have to keep around for other
archives at least for the debian archive apt isn't guessing anymore.

The thing is, the whole decentralized part of this option falls apart if
we add a flag in the Release file. A compromise here would be an
additional (unsigned) file with a list of available hash-sum-implementations
and a client would compare it with what it supports and chooses the
strongest out of it. That would be at least just one file which might or
might not be there…

>> On a mirrorsync the indexes files will be updated and get new hashes,
>> but a new client still working with the old InRelease file will still get
>> the old indexes files based on their hash.
> When reading this I thought you meant that repository generation tools
> will always generate those files and keep older versions and the client
> will always download the hashed names, but then:
>> As the mirror is it who generates the by-hash he would be free to not do it
>> and/or to store old indexes for a self-chosen length of time. Given that a
>> client needs to fallback for every file it can't get by-hash to request it
>> by its "old" location -- and in the long run it has to check for different
>> checksums as we move to stronger hashes over time.
> Urgs, the mirror should generate them? I do not think that this is a
> good idea at all.

(thanks god, i am not the only one)

Again, the argument here is that each mirror can decentralized choose to start
doing this, so that the master-archive doesn't need to be changed to support
it now (or possibly ever).

>> Option B would be to introduce "versioned" components. The InRelease file
>> would include a tag specifying a version (a good version would e.g. be
>> the date(time) of the creation) for the components it includes.
>> A client would then not request files under $component but under
>> $component-$version, e.g. instead of main it would be main-2012-05-10.
>> An old client would "just" follow the link from main to its current version
>> off-spin similar to how unstable links to sid. As the InRelease includes
>> a new tag a new client will need to use this "feature" we don't need a
>> fallback.
> How about making that a subdirectory instead?
> i.e.
> 2012-05-10/main
> 2012-05-10/contrib
> 2012-05-10/non-free
> main -> 2012-05-10/main
> contrib -> 2012-05-10/contrib
> non-free -> 2012-05-10/non-free

Fine as well. Actually even better. Thanks!

> I see both suggestions involve symlinks. Do all (or at least all
> official mirrors) support symlinks?

As i said symlinks are currently used to link archivenames to
codenames (stable -> squeeze, unstable -> sid, …) so i guess
they are supported.

Option A actually makes heave use of hardlinks in its current form -
i am not sure how well supported these are -, but it should be
possible to do the same with a couple of symlinks instead.

>> So, in short: What do you think? Is there an option C or are there
>> features/problems in A or B which i have omitted/overseen?
> One could also look into the client side: a InRelease file is not that
> big, so if the hashes do not compare, one could reload the InRelease
> file to see if anything changed.

Something clever needs to be done if the InRelease file we got is
newer than the rest of the indexes (e.g. because we got the InRelease
from mirror A and the rest from the not-yet-updated mirror B)
Same goes for a mirror in the process of updating: Some indexes
will be new, some indexes will be old and in the middle is the client
unable to get out of the mess. Quiet a few people are complaining
that downloading tens of megabytes from the internet "just" to end
with a hashsum mismatch and throwing away the downloads because they
are old now unusable results isn't ideal.

I hope we can improve the apt client in the process of the GSoC
project and beyond that, but it would be good to have an archive
which works reasonable well with a "dumb" client as well.

Best regards

David Kalnischkies

Reply to: