Re: Hashsum mismatch prevention strategies
On Monday 21 May 2012 02:12:06 Julian Andres Klode wrote:
> On Sun, May 20, 2012 at 06:30:06PM +0000, Raphael Geissert wrote:
> > Goswin von Brederlow wrote:
> > I'm not even sure a new field needs to be introduced. It's just a
> > matter of stating that the fields are ordered and if you have hash X
> > you need all the patches mentioned in that line and the ones that
> > follow.
> Assuming that would break reprepro repositories.
Alright, so a new field *is* needed.
> > Additionally, I'd like an alternative way to distribute the pdiffs to
> > be considered: after n days (say 2), gunzip the patches worth of one
> > day and tgzip them. This not only reduces the number of requests
> > needed to download all the files, but it also provides better
> > compression. Adding an Index file to the tarball would be enough for
> > apt to know which ones to apply and which ones to ignore.
> A tar in between would complicate the code on the client, and break
> backwards compatibility.
Until recently the archive would not provide enough diffs to allow Packages
files older than a couple of days to be reused. Returning to that behaviour
for clients that don't support tar-ed diffs wouldn't affect much. I even dare
to especulate that clients who have an unmetered connection will welcome
that, as it currently takes considerably more time to download 10 or so
pdiffs that downloading the whole Packages file.
The whole indexdiff logic could be moved away from apt-pkg to a method that
does the right thing. The main bit that is missing is the ability for a
method to issue sub-requests without instantiating the method directly
(which is what the mirror method does.)
How does that sound?
> > What I had proposed on irc was a combination of a) and b), sort of:
> > Let's call it option D:
> > * Only the InRelease files have a constant name
> > * All the other files have a date or some other sort of serial number
> > appended, e.g. Packages-12042014
> > * Compatibility symlinks are kept in place, but it is known they will
> > be prone to race conditions (404s, even).
> > * APT and others find the names of the latest available indexes from
> > the InRelease file
> > * InRelease becomes the one and only place at which a mirror "switches"
> > from one push to another.
> Which is basically the same I proposed as well [and what is option
> C from the Ubuntu discussion].
Right, I had not read your email by then. Using a hash is probably overkill
and if there's even the possibility of gaining something by using --fuzzy,
it would be killed by using such a naming format.
According to rsync's code, it uses a modified Levenshtein edit distance, with
a limit of 25 edits. Normally, one could expect the edit distance of two
hashes to be near the length of the hash.
Raphael Geissert - Debian Developer
www.debian.org - get.debian.net