[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Better pdiff handling for apt



On 2014-01-05 11:40, Vitaliy Filippov wrote:
> Hi!
> 
>> (I think a nice efficient compromise method would be Patch-Step-Size:
>> 8 8 4 4 4 4 2 2 1, which would only update a lg(diffs) each time you
>> updated the packages file, and only require users to download and
>> apply a maximum of lg(diffs) to get up to date)
> 
> Thank you, better pdiff handling is a great idea! I personally have
> pdiffs disabled everywhere with the current algorithm, simply because
> apt often can't finish updating in any reasonable time using pdiffs, and
> it also eats very much CPU during the update process. Maybe that's
> because 'ed' utility is slower than 'patch'? Or it isn't?
> 

The slowness occurs because each pdiff is applied alone, generating a
new file.  Example:

  start-file + patch1 -> intermediate-file1
  intermediate-fileA + patch2 -> intermediate-file2
  ...
  intermediate-file(N-1) + patchN -> desired-file

Where start-file, intermediate-fileX and desired-file can be of 30MB or
more (uncompressed).  With the merging you get:

  merge patch1 patch2 ... patchN -> super-patch
  start-file + super-patch -> desired-file

The patches are (usually?) vastly smaller than the immediate files and
they are faster to merge than creating the intermediate files.

The second bottleneck in the PDiff implementation is that APT downloads
one at the time, applies it, checks the checksum of the resulting file
and then check if it needs another patch.  So in my list above, you have
to add a "fetch patchX" between each patch-line.
  Mind you, this approach makes sense if there is server side merging of
patches.  The current PDiff Index format does not allow you to see what
the result is of applying a patch, so APT is playing it "safe rather
than sorry" (compared with apt-file's implementation that goes for speed
above all).

> And also I have a question about the implementation - if the old patches
> (diffs) are merged, how does it rebuild patches after package updates?
> 

Server-side? Don't know - but client-side merging is not difficult (and
implementations of it exists already now), so if this is a problem
server-side, my personal recommendation would be to do merging only on
the client side.

~Niels



Reply to: