[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hashsum mismatch prevention strategies

On 12852 March 1977, Julian Andres Klode wrote:

>> - Only *ONE* compression for anything in dists/.
>>   To switch to another compression later, a second may be added for the
>>   next release, and as soon as that release is out, the old one goes
>>   away.[fn:1]
>>   The current situation with needlessly doubling the information for
>>   years already just sucks.
> That should be easy for most cases. You can already drop bzip2 compression
> for Packages and Sources if you want to, I don't think anyone really
> cares about them.

Maybe some users do, but yes, tools in general should cope.

> For Translations, it might be more difficult, as they
> are currently only available in bzip2 and some might rely on this.

Yes. I could bet on that to be UDD. Though it shouldn't be hard to change.

> Once you have dropped the bzip2 compressed indices, you might already
> have gained enough space to keep an older generation of the indices
> around.

Guess so.

>> - Only one "release" file, drop away the old Release and Release.gpg.
>>   Would anything break right now if I would drop the Release/Release.gpg
>>   away for >>squeeze(-*)?
> Cupt and Smart do not support InRelease files yet. Smart will probably
> get support for them when Ubuntu introduces them, as Canonical is
> involved in Smart development and uses Smart for their Landscape
> stuff. For Cupt, see bug 623113. Likewise but maybe even more
> important, debootstrap and cdebootstrap would both break.

Ohwell, fix cupt. :)

Yeah, debootstrap had a bug, #638682, cdebootstrap none, so I filed
one, #673625. Both are now severity important, as I think that thing
warrants more than just normal, but it is not RC.

>> - Do we need an extra Release file per binary-$arch, saying nothing than
>>   what we already know from the location of the directory?
> At least APT does not need it, and dselect does not appear to need
> it either.

Could just drop it and see who complains.
I'll check our code later to see how hard it would be to do that in
experimental only as a start.

>> - Hey, if we are at it, wth binary-$arch, lets rename to $arch only.
> I don't think we need to deliberately introduce incompatibilities.

Hapüh. (And yes, I knew that thats not going to happen soon)

>> - Saner diffs. Now that one is a "fun" one, I know, but having something
>>   where you don't need to jump through dozens of very small files to end
>>   up with the final result, but have one and out comes the result, for
>>   example would be one thing. The sheer number of small .diffs makes it
>>   unusable as soon as you have large bandwidth. It would be nice(r), i
>>   think, if we could have something that lets you go from "x days ago to
>>   $now".
> You can do this without any format change, just by changing the
> algorithm. reprepro already creates diffs from past to current, instead
> of incremental ones.

The merged diffs are something to check, though more in terms of "we
make dists/ explode again or not" and in "daks pdiff creation sucks and
couldn't handle this yet, needs new code".

>> > Option A is that each mirror (if it chooses to do it) builds a big "index" of
>> > hashsum-named hardlinks to the "old" location of the file. Given a
>> > repository like this:
>> I am against doing stuff outside the archive. We should have something
>> that we say "this is it. mirror it. be done.". Not "this is it. mirror
>> it. now do process XY".
> You can also do this in the archive, it does not really have to be
> done on the mirror.

I know that.

>> > So, in short: What do you think? Is there an option C or are there
>> > features/problems in A or B which i have omitted/overseen?

>> As you see I don't have a written-out C right now.
> There is a third option in the Ubuntu pad, that is basically suffixing
> the indices with the hash, let's say
> 	Packages-d86236a0c540b340986c99e94d0d9159c66b96a34adc2de01e2668f2d3a2ded2.gz
> (using SHA256 here). This approach is relatively easy in my opinion. We can
> then also add a field to the Release file saying (just for optimisation
> purposes, instead of blindly trying the hashes):
> 	Indices-Hashing: sha256
> (although we can just look at e.g. SHA256 as well and see if the hash
> link is listed in there) and are basically done. We just keep two of
> those files around, one for the past state, one for the current state.
> It also seems closely related to the stuff RPM people do.

> Another option is to have a file Packages.old.gz, and APT then simply
> fetches that one when it notices that Packages.gz is wrong. Should
> work as well.

> The third option is to do the same .old stuff, but do a
> 	cp -al dists dists.new
> 	rsync master:dists ... to  ... dists.new
> 	mv dists dists.old
> 	mv dists.new dists

> And then fallback to dists.old if there is something wrong
> in dists. This should be atomic enough for everyone, and easy
> to implement.

Hrm. Thats similar to the Option A in the end result. But I like it more.

If we suffix the indices with their hash, and keep the last two (or
three?) versions of them around AND in the InRelease file, then clients
can do clever handling of them and also shouldn't end up with mismatches
anymore. We can keep the files, or just their checksums.

When a mirror run happens while you apt-get update, you get the new
InRelease file. But your packages/sources don't match. But you can
verify they matched the previous version. And can use that.

bye, Joerg
You know, boys, a nuclear reactor is a lot like a woman. You just have
to read the manual and press the right buttons.

Reply to: