Re: These new diffs are great, but...

To: Florian Weimer <fw@deneb.enyo.de>
Cc: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>, Marc Haber <mh+debian-devel@zugschlus.de>, debian-devel@lists.debian.org
Subject: Re: These new diffs are great, but...
From: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
Date: Thu, 24 Aug 2006 06:14:32 +0200
Message-id: <[🔎] 87d5aq3h3r.fsf@informatik.uni-tuebingen.de>
In-reply-to: <[🔎] 874pw7s8mo.fsf@mid.deneb.enyo.de> (Florian Weimer's message of "Sun, 20 Aug 2006 17:57:03 +0200")
References: <20060629180509.GB1986@yi.org> <20060629183541.GA11648@piper.madduck.net> <20060629183836.GA5061@uio.no> <20060629184345.GG1986@yi.org> <E1FwC3l-0002TZ-P2@scyw00225.scy001.de> <20060630091037.GA13596@rotes76.wohnheim.uni-kl.de> <E1FwJCZ-0006yZ-JA@scyw00225.scy001.de> <87hd22sj2j.fsf@mid.deneb.enyo.de> <871wsxvbft.fsf@informatik.uni-tuebingen.de> <[🔎] 874pw7s8mo.fsf@mid.deneb.enyo.de>

Florian Weimer <fw@deneb.enyo.de> writes:

> * Goswin von Brederlow:
>
>> What code do you need there? If the rred method keeps the full Index
>> file in memory during patching it can just be fed all the patches one
>> after another and only write out the final result at the
>> end. Combining the patches is a simple cat.
>
> #383881 suggests that I/O bandwidth is not the issue.  In fact, if you
> keep the file in memory and repeatedly patch it, you won't get away
> from the O(n*m) complexity (n being the file size, m the number of
> hunks in the patches), or whatever complexity it is.  Shuffling
> pointers instead of full lines only saves a constant factor, which
> might not be enough.
>
> However, patching rred to apply patches in a single run would be a
> good start because all further optimizations will need it.

Why should the number of chunks matter?

What matters is reading, parsing and writing the file O(lines) and
then the number of changes (lines of changes) O(changes). Combined
this gives O(lines + changes) if the file is read once at the start
and then all patches are applied.

You can do that by combining the individual patch files or by teaching
rred to do a single run with multiple patch files. Same result. Both
solve the problem of O(lines * chunks + changes) complexity.

As for using pointers to lines and shuffeling them that seems to be
the only sane thing to do. All patch operations are line based so it
is essential that a line can be found and replaced in O(1). A simple
array of pointers to lines solves that.

MfG
        Goswin

Reply to:

Follow-Ups:
- Re: These new diffs are great, but...
  - From: Florian Weimer <fw@deneb.enyo.de>

References:
- Re: These new diffs are great, but...
  - From: Florian Weimer <fw@deneb.enyo.de>

Prev by Date: Re: Debian ISOs
Next by Date: Running x86-64 debian inside i386 pbuilder on AMD64
Previous by thread: Re: These new diffs are great, but...
Next by thread: Re: These new diffs are great, but...
Index(es):
- Date
- Thread