Re: These new diffs are great, but...

To: Florian Weimer <fw@deneb.enyo.de>
Cc: Marc Haber <mh+debian-devel@zugschlus.de>, debian-devel@lists.debian.org
Subject: Re: These new diffs are great, but...
From: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
Date: Fri, 07 Jul 2006 14:36:38 +0200
Message-id: <[🔎] 871wsxvbft.fsf@informatik.uni-tuebingen.de>
In-reply-to: <87hd22sj2j.fsf@mid.deneb.enyo.de> (Florian Weimer's message of "Fri, 30 Jun 2006 18:29:40 +0200")
References: <20060629180509.GB1986@yi.org> <20060629183541.GA11648@piper.madduck.net> <20060629183836.GA5061@uio.no> <20060629184345.GG1986@yi.org> <E1FwC3l-0002TZ-P2@scyw00225.scy001.de> <20060630091037.GA13596@rotes76.wohnheim.uni-kl.de> <E1FwJCZ-0006yZ-JA@scyw00225.scy001.de> <87hd22sj2j.fsf@mid.deneb.enyo.de>

Florian Weimer <fw@deneb.enyo.de> writes:

> * Marc Haber:
>
>> The machine in Question is a P3 with 1200 MHz. What's making the
>> process slow is the turnaround time for the http requests, as observed
>> multiple times in this thread alone.
>
> Then your setup is very broken.  APT performs HTTP pipelining.

Actualy it does NOT from what strace shows me. The apt http method
uses keep-alive but not pipelining. For example apt-get source bash
will send a GET request, read the file, send the next GET, read the
file, send the third GET, read that file. With pipelining it should
send all 3 GETs at once or at least intermixed with reading the files.

But even with pipelining that would not help since the pdiff files are
not queued up with the http method in advance but one after the other.

> On my machines, I see the behavior Miles described: lots of disk I/O.
> Obviously, APT reconstructs every intermediate version of the packages
> file.

Yes, I noticed that too. Patching a 15MB Packages file takes a lot of
time. You can watch the progress during rred runs most of the time
even on a modern amd64 system.

> The fix is to combine the diffs before applying them, so that you only
> need one process the large Packages file once.  I happen to have ML
> code which does this (including the conversion to a patch
> representation which is more amenable to this kind of optimization)
> and would be willing to port it to C++, but someone else would need to
> deal with the APT integration because I'm not familiar with its
> architecture.

What code do you need there? If the rred method keeps the full Index
file in memory during patching it can just be fed all the patches one
after another and only write out the final result at the
end. Combining the patches is a simple cat.

MfG
        Goswin

Reply to:

Follow-Ups:
- Re: These new diffs are great, but...
  - From: George Danchev <danchev@spnet.net>
- Re: These new diffs are great, but...
  - From: "Martijn van Oosterhout" <kleptog@gmail.com>

Prev by Date: Re: These new diffs are great, but...
Next by Date: Re: make -j in Debian packages
Previous by thread: Re: These new diffs are great, but...
Next by thread: Re: These new diffs are great, but...
Index(es):
- Date
- Thread