Re: These new diffs are great, but...
Florian Weimer <firstname.lastname@example.org> writes:
> * Marc Haber:
>> The machine in Question is a P3 with 1200 MHz. What's making the
>> process slow is the turnaround time for the http requests, as observed
>> multiple times in this thread alone.
> Then your setup is very broken. APT performs HTTP pipelining.
Actualy it does NOT from what strace shows me. The apt http method
uses keep-alive but not pipelining. For example apt-get source bash
will send a GET request, read the file, send the next GET, read the
file, send the third GET, read that file. With pipelining it should
send all 3 GETs at once or at least intermixed with reading the files.
But even with pipelining that would not help since the pdiff files are
not queued up with the http method in advance but one after the other.
> On my machines, I see the behavior Miles described: lots of disk I/O.
> Obviously, APT reconstructs every intermediate version of the packages
Yes, I noticed that too. Patching a 15MB Packages file takes a lot of
time. You can watch the progress during rred runs most of the time
even on a modern amd64 system.
> The fix is to combine the diffs before applying them, so that you only
> need one process the large Packages file once. I happen to have ML
> code which does this (including the conversion to a patch
> representation which is more amenable to this kind of optimization)
> and would be willing to port it to C++, but someone else would need to
> deal with the APT integration because I'm not familiar with its
What code do you need there? If the rred method keeps the full Index
file in memory during patching it can just be fed all the patches one
after another and only write out the final result at the
end. Combining the patches is a simple cat.