Re: These new diffs are great, but...
* Goswin von Brederlow:
>> However, patching rred to apply patches in a single run would be a
>> good start because all further optimizations will need it.
> Why should the number of chunks matter?
If you use the naïve algorithm, it does. But rred implements
something more involved, leading to a O(patch-files * lines + changes)
complexity (or something very similar).
> What matters is reading, parsing and writing the file O(lines) and
> then the number of changes (lines of changes) O(changes). Combined
> this gives O(lines + changes) if the file is read once at the start
> and then all patches are applied.
I guess you'll find it very hard to get down to O(lines + changes). 8-)
Either you need to make multiple passes over the original file (like
the current code does), or you need to combine patch files before
applying them. The latter involves some kind of sorting, and unless
you postulate an upper bound on the number of lines in a file, you'll
end up with super-linear complexity.
> You can do that by combining the individual patch files or by teaching
> rred to do a single run with multiple patch files. Same result. Both
> solve the problem of O(lines * chunks + changes) complexity.
The real problem is that APT's architecture doesn't allow rred to look
at all patches at once. At least that's what I've been told.
> As for using pointers to lines and shuffeling them that seems to be
> the only sane thing to do. All patch operations are line based so it
> is essential that a line can be found and replaced in O(1). A simple
> array of pointers to lines solves that.
Inserting in the middle of an array is not a constant-time operation.
That's why the naïve algorithm is slow.