[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hashsum mismatch prevention strategies



David Kalnischkies <kalnischkies@gmail.com> writes:

> On Sun, May 20, 2012 at 2:35 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> Joerg Jaspert <joerg@debian.org> writes:
>>> On 12843 March 1977, David Kalnischkies wrote:
>> I would like to extend the format of the Index file there so that a
>> single entry can list multiple patches like this:
>>
>> SHA1-Current: ced0270ddecf73e18ed604a632cd201d8006980e 29140592
>> SHA1-History:
>>  333d3ea2262f49befe07069df22390277587a4fb 29094219 2012-05-06-0214.07 2012-05-06-0813.32 2012-05-07-0813.32
>>  5bca4bd997d4466c9f7df3fcf77a21e02a152c12 29094451 2012-05-06-0813.32 2012-05-07-0813.32
>>  2fbf8424e650310c7cd9cf20070543ea036a39bf 29094377 2012-05-06-1414.33 2012-05-06-2013.13 2012-05-07-0813.32
>>  b08a3149b0be7201e98d74c699c0d5a33d493e84 29105545 2012-05-06-2013.13 2012-05-07-0813.32
>>  d6fb28816fffb9b74ae60c82d503cc79d531185c 29112085 2012-05-07-0213.45 2012-05-07-0813.32
>>  359bd6a4bf838c87d9d19b10e41a576784810237 29111799 2012-05-07-0813.32
>>  8b7482acdeda7514ad887720bda05ab9cf9d7abc 29111794 2012-05-07-1413.33 2012-05-07-2013.52 2012-05-08-0817.24
>>  6d9b7ccb8e2e6c066ac80e779d64d1f523042ae9 29112045 2012-05-07-2013.52 2012-05-08-0817.24
>>  12e2733c9a34fff1c86735b266ff315e84a78fc5 29112958 2012-05-08-0213.34 2012-05-08-0817.24
>>  ef7882176b5277e5be7b38e396c69c8f8167ac35 29117503 2012-05-08-0817.24
>>  380c5683e0942266c0570f77c989db2e565ca799 29120576 2012-05-08-1413.33 2012-05-08-2018.17
>>  cad793bca1f269d5f1ec68df1bc59931d7321098 29125630 2012-05-08-2018.17
>>  1799dbc63e7856d42cec3eb6b4192290f007bc9a 29127145 2012-05-09-0213.20
>>
>> So instead of a single patch file a list of patch files is given. The
>> above example would be for the above log(N) method. With the current
>> totaly incremental files the lines would have 1-56 patches listed.
>
> The old proposal i remember introduced a new section SHA1-Results mentioning
> the sha1sums of the Packages file after the patch is applied. This way a
> client can built up it's own path and old clients can continue to function
> with this new style (just keep being slower) - otherwise we would need to
> wait for all clients supporting this new scheme to reach oldstable or
> otherwise stable users would be unable to have unstable in their sources.list.

That puts marginaly more work on the client but is indeed better. Who
has access to a DAK to write a patch for this?

>> Option C)
>> =========
>>
>> The "Mirrors are going to be out-of-sync. Deal with it." option.
>>
>> Instead of coming up with more and more complex repository layouts or
>> mirror scripts why not just accept that sometimes mirrors will be
>> out-of-sync and think of a way we can help the client to recover.
>>
>> To recover the client needs to get hold of the correct files. So lets
>> add a location where the client can request them. In InRelease we add
>>
>> Hash-Service: http://hash.debian.org/hash?h=#
>>  ftp://hash.debian.org/hash/#2/#
>>  rsync://hash.debian.org/hash/#2/#
>>
>> Now when a client needs to download a Packages/Sources/PDiff/Translation
>> file with SHA1 863c99eb851479720c8930b31a9e85b6535ddab9 and the mirror
>> does not have the correct file the client can download one of
>>
>> http://hash.debian.org/hash?h=863c99eb851479720c8930b31a9e85b6535ddab9
>> ftp://hash.debian.org/hash/86/863c99eb851479720c8930b31a9e85b6535ddab9
>> rsync://hash.debian.org/hash/86/863c99eb851479720c8930b31a9e85b6535ddab9
>>
>> In Hash-Services a '#' means the full SHA1 checksum and #<X> the first X
>> characters of the SHA1 checksum.
>>
>> This service would only be used when a mirror is out-of-sync and only
>> for the index files. So traffic should be much less than mirrors usualy
>> have.
>
> The problem is, usually the mirror has the new indexes (Packages, ?) and an
> old InRelease file, hence it has acquired quiet a few MB's of data (in worst
> case) which can't be validated. What i need here is a way to get the "new"
> InRelease file, not a way to get the old indexes as i already have the new
> ones - i would need to download the old indexes completely as there is no
> possibility to "downgrade" the new indexes.

Actualy that shouldn't be a problem if you would use PDiff files. The
patching will just stop one step short of the end so that Packages will
match the InRelease file. But the PDiff file is probably out of sync too
so it is skiped (right?) and the full file downloaded instead.

With the Hash-Service the PDiff index file would be repaired from
hash.debian.org and then Packages can be patched using the pdiff files
from the original server again. So you shouldn't get into the case where
the out-of-sync Packages file is downloaded at all normaly.

But say you do, for whatever reason (pdiff disabled or broken). Then
yes, you've downloaded a few MBs of data that can't be validated and are
useless and you will end up downloading yesterdays again. Wastefull:
yes. But it works. The PDiff files are there to avoid the waste and as
fallback I think it is good enough.

> If we really get the new InRelease file and old indexes, we still have the
> problem that the old indexes can't be validated before we work with them
> (= to apply a patch we need to decompress them. A security bug in the used
>  compression type and we have a problem?).

The index files are just like any other file. If the file is broken then
fetch it from hash.d.o.

> Beside that it perfectly shows why i dislike using hashes:
> SHA1 is not an ideal choice anymore, so at some point we should move to
> something else leaving us with a transition and therefore a fallback-
> system behind even through any random number would have done the trick.

Yes, any random number will do the trick. The random number I chose was
the one given in InRelease as checksum of the file. But feel free to add
a new entry giving each file a forever unique random (or continious)
number.

I don't see a problem with using the already existing checksums for this
purpose. Over time older checksums (like md5) will be discontiniued and
new checksums will be added. But we already need a transition to new
checksums and need old+new checksums to exist for a transitional period
so clients can change the way they verify files. There is no reason you
can't have

http://hash.debian.org/hash?t=md5&=863c99eb851479720c
http://hash.debian.org/hash?t=sha1&h=863c99eb851479720c8930b31a9e85b6535ddab9
...

Different checksums can coexist. The idea remains the same. The client
can use the strongest checksum it knows about.

MfG
        Goswin


Reply to: