[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hashsum mismatch prevention strategies

>> Im pretty sure Debian won't go down below 6 hours any time soon and if
>> we really do so, I wouldn't think hourly is a sane thing to do for an
>> archive like ours. Anyways, different topic.
> Sure, but think of all the kitties^Wderivatives: Some might be not as
> strongly organized as debian. Many properly follow a more relaxed "archive
> updates then needed" approach which might mean a few days or a few minutes
> between them at times. But yeah, that is a different topic…

I know. I run backports.d.o with one hour, and security with "random,
down to 5 minutes if the right things happen"...

>> Currently the translation breakage, though the latest ftpsync I released
>> yesterday should fix this a bit for the user experience.
> Thanks! This properly reduces the hit-rate again to a bearable amount,
> but i fear at least one user will keep hitting it [0] ;)
> (i really don't know how he does that…)

Well, he is Jidanni. He is just spethial.

>> Oh sure, just use staged pushing from ftpsync. :)
> What i meant was two mirrors on the same domain with different
> update times. One of the two will always be the one which has
> a file earlier than the other.

Now that is a broken setup.

>> - Only *ONE* compression for anything in dists/.
>>  To switch to another compression later, a second may be added for the
>>  next release, and as soon as that release is out, the old one goes
>>  away.[fn:1]
> FTR: In theory you can do this already with apt -
> minus the pdiffs which are hardcoded as gzip files.
> (the indexes do not mention a compression type)

A thing to adjust? Well, its in dists/, so probably should adhere to the
same rules, unless we define an exception. A new line in Index,
"Compression: bla"?

>> - Only one "release" file, drop away the old Release and Release.gpg.
>>  Would anything break right now if I would drop the Release/Release.gpg
>>  away for >>squeeze(-*)?
> Again, in theory you can do this with apt as wheezy supports InRelease
> and use it if available and only falls back to Release{,.gpg} if not.
> (as Debians is basically the only archive with InRelease so far)

debian-cd would be one breakage, (c)debootstrap another. Enough to not
drop them right now.

>> - Do we need an extra Release file per binary-$arch, saying nothing than
>>  what we already know from the location of the directory?
> As Julian noted in his draft mail apt doesn't use them.
> I don't know which tools might use them…
> They at least don't look that useful.

Turns out debian-cd does.

>> - Get anything thats not "an index" out of dists/ and keep it out. The
>>  installer is already on it, I started that thread before replying
>>  here, so that gets out. We should nail it down that nothing else will
>>  come in here in future, unless it's an index stuff.
> Only related, but while reordering:
> As hinted in the first mail, Contents should be in
> dists/$(RELEASE)/$(COMPONENT)/binary-$(ARCH)

It would unclutter the component directories, yes.

> The only change is that e.g. apt-file search /non-free/firmware/file
> will not give a result anymore on systems without non-free.
> (which is a feature or a "bug" mostly depending on how libre you are).

You sure? If you have an apt-file that uses the old old location in
dists/$suite, then you have main only anyways. They are symlinks into
main/. If you have an apt-file that downloads them for all components,
then it would need symlinks there too, for one release, and would still
find em all.

>> - Saner diffs. Now that one is a "fun" one, I know, but having something
>>  where you don't need to jump through dozens of very small files to end
>>  up with the final result, but have one and out comes the result, for
>>  example would be one thing. The sheer number of small .diffs makes it
>>  unusable as soon as you have large bandwidth. It would be nice(r), i
>>  think, if we could have something that lets you go from "x days ago to
>>  $now".
> (what follows is basically the short version of Goswin as i hadn't
>  his response while writing this one)

Ah, I happen to ignore him. Its usually better that way.

> There are old threads about that. It is supported in apt already to
> "skip" patches, so if you order the indexes correctly you can
> already do that today (and i think reprepro supports creating this).
> But this doesn't remove small diffs, it adds more of them as you need
> to provide for each mirrorsync a way to move to the days-skip diff.
> (at least if very short paths are desired)

> Might be better to tell apt to download all diffs in a row and merge
> them themself instead of downloading and applying each individually.
> (There is an old prove of concept for that, too. Just can't find it
> now)

I don't know which way would be better. The way it is now is just no

>> - One rsync run ought to be enough to mirror all of Debian (or any
>>  derivate using similar structure). Not X, with various
>>  include/excludes.
> Yeah, but how? I don't see a scenario in which updating InRelease
> too early is a recoverable situation (or at least a situation in which
> we don't download data we later can't validate).

Yes it is, if apt doesn't kill of the old files but keep them, and only
if it could validate the new series it downloaded, then move them over.

Yes I know, that increases the disk space used. But it would let apt
decide between the newest and one older.

Also, would it help if we just go and sign each and every
Packages/Sources/Translation file? We can still have a toplevel
InRelease that sticks em together as "all of those in this combination are
mirror run/archive state of time XY", but as long as the sig for file XY
works, you would know you got an "approved" file to work with.

And then can deal with the Packages file. Combine it with a Translation
file that has hashsum mismatch for the new compared to InRelease, but
heck, the descriptions in the old can be used as long as Description-md5
is right. Which is the main key into it anyways.

>>> Option B would be to introduce "versioned" components.

>> *hate*, sorry. Thats just too ugly IMO.

>> Though a variation of that, doing it with "versioned" suites, I would
>> hate less. But still not nice.

>> Right now my tendency would go to a hash based tree for the indices
>> combined with hardlinks for the old tools and also the users.
> :) Fair enough. I realized that B hard-depends on mirrors adopting
> "clever" ftpsync scripts, which looked like a nice illusion for a
> while.

Debian mirrors at least are strongly encouraged to use ftpsync, there
aren't so many "homebrewn" anymore. Some are, yes, but usually they are
active enough to adjust when needed.

> B was my on the spot invention to tackle problems i had with A.
> First that the client doesn't know if the mirror will support it or
> not.

> Second that a hash carries no useful information, but might be a
> nightmare to get right of if we want to transition to another hash.
> (Beside third that i don't like the idea of implementing something
>  which might or might not be adopt by any or only some if it is
>  such a dramatic change - adding fuel to First)

> I had intermediate ideas with versioning the individual indexes and
> include them as usual in the InRelease file, but discarded them quickly
> with the silly idea that stuff i don't know a thing about (mirrors) might be
> easier to change than apt and all other clients.

> Adding multiple versions to the InRelease file increases its size quiet
> a bit up to a point there compression would be a topic, but this couldn't
> be done in a sane way hence a global version-tag and as it seemed to
> be easier to version just a directory (e.g. the component) than all files
> (as this includes a multitude of compatible links) B was born.

Why couldn't we compress InRelease?

bye, Joerg
<liw> I like shooting people
<liw> er, wait
<liw> that could be quoted out of context

Reply to: