[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: These new diffs are great, but...

Florian Weimer <fw@deneb.enyo.de> writes:

> * Goswin von Brederlow:
>> What code do you need there? If the rred method keeps the full Index
>> file in memory during patching it can just be fed all the patches one
>> after another and only write out the final result at the
>> end. Combining the patches is a simple cat.
> #383881 suggests that I/O bandwidth is not the issue.  In fact, if you
> keep the file in memory and repeatedly patch it, you won't get away
> from the O(n*m) complexity (n being the file size, m the number of
> hunks in the patches), or whatever complexity it is.  Shuffling
> pointers instead of full lines only saves a constant factor, which
> might not be enough.
> However, patching rred to apply patches in a single run would be a
> good start because all further optimizations will need it.

Why should the number of chunks matter?

What matters is reading, parsing and writing the file O(lines) and
then the number of changes (lines of changes) O(changes). Combined
this gives O(lines + changes) if the file is read once at the start
and then all patches are applied.

You can do that by combining the individual patch files or by teaching
rred to do a single run with multiple patch files. Same result. Both
solve the problem of O(lines * chunks + changes) complexity.

As for using pointers to lines and shuffeling them that seems to be
the only sane thing to do. All patch operations are line based so it
is essential that a line can be found and replaced in O(1). A simple
array of pointers to lines solves that.


Reply to: