[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hashsum mismatch prevention strategies



On Mon, May 21, 2012 at 9:07 AM, Raphael Geissert <geissert@debian.org> wrote:
> David Kalnischkies wrote:
> [...]
>> (Unfortunately nobody at Google noticed the irony yet that they enable
>>  pipelining in chromium by default ~3 weeks ago, but deliver chromium
>>  over a broken server… not to mention SPDY which has an emphasis on
>>  multiplexing and pipelining…)
>
> Filing a report against chromium (.org) might help. At least it is easier to
> ask somebody who works at Google to try to contact somebody who might me
> able to do something about it once there's a report somewhere.

There once was [0] which is claimed to be fixed, but is still mentioned more
than once in a while. And there is [1] which is just "random" highlatency,
which at times is at least sometimes claimed to be fixed by disabling
pipelining (and often claimed that pipelining doesn't change a bit)


Not using chromium nor the attached repository properly doesn't help
in confirming (or not) the fix for either of the issues. I just like the
(possibility) here, maybe also because i recently stumbled over a mail… [2]

[0] http://code.google.com/p/chromium/issues/detail?id=38608
[1] http://code.google.com/p/chromium/issues/detail?id=93409
[2] http://lists.debian.org/deity/1998/02/msg00027.html

Anyway, that is pretty offtopic…


>> In the end apt-get (and friends) breaks unreproducible "sometimes" with
>> a pretty indistinguishable errormessage, so a user will properly never
>> understand what is going on. The benefit from pipelining isn't that great
>> to ratify this - at least in my eyes (after all, high latency connection
>> with a high bandwidth aren't that common - usually you have low/high or
>> high/low combination. In the later the benefit from pipelining is easily
>> eaten up by the size of the files we need to acquire in general).
>
> Do you happen to know what are the usual symptons besides the out of order
> responses?

Anything. I have seen (records of) the B C A thing as well as just responding
with A and forgetting completely about B and C. The later is handled.
The more problematic situations are just C as answer and a response
for A which basically is A, B and C in one batch "nicely" combined as
three threads seem to be sending a response at once.


> I'm thinking that given that the downloaded data has already been hashed
> (and hence the mismatch is detected), the method could check if it maches
> the hash of one of the other requested files.
> There might be some issues with the cyclic queues that are used, but they
> shouldn't be too hard to fix.
>
> What do you think?

I wonder how opera and now chromium handle this. They don't have hashsums
and still seem to cope… I know that firefox had some inbuilt proxy/webserver
blacklist, but not that many debian mirrors will run on ISS, i guess…


The problem is that it is not the download method deciding that it has a
hashsum mismatch but the acquire-item.cc classes which don't know what the
method has in the pipeline - which could possibly change while we are
thinking about the mismatch as this runs in different process. There might
be some way around that with a bit more thinking, but i have the gut feeling
that this could end-up being quiet a bit of code.


The alternative to tell the methods the expected values (hashes, size, …)
might be nicer. Especially http code could use the size (if known) to
compare with Content-Length (if send by server) to know before the
download is completed if the data we got belongs to the file we requested.
(size mismatches very likely produce hashsum mismatches, too)

[Noted both as an idea to test for the next abi break]


>> Also, currently pdiffs aren't downloaded in a pipelineable fashion, so
>> this isn't even a regression in this regard, but would be an added
>> improvement in case we come to a point in which pipeline is enabled by
>> default again.
>
> Yes, hence my comment about not gaining _that much_ if all the necessary
> pdiffs are known in advance, if pipelining is disabled.

You still have the (big) benefit of not patching the big Packages file
x-times, but merging x patches together and patching Packages only once.
The download of these small patch files is usually free compared to the
time "wasted" to read, write and hashing the "new" Packages file just
to replace it with an even newer Packages seconds later…
(If this patch-merging is implemented in the clients of course, but
 if patching is done in parallel to downloading the benefit is okay, too)


Best regards

David Kalnischkies


Reply to: