[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Faster importing of packages (ddtp.debian.org)

Hash: SHA1

Hi Martijn,

> 2. Only import changes. Each day there are Packages.diff files
> produced with just the changes from the previous day. In theory you
> could use this to just import the packages that have changed. Problem
> is I can't find much information about how that actually works. It
> looks like an ed-style diff, but I'm not sure.

You are right that the pdiffs are ed-style diffs are a right pain in the 
rear end to work with. Unfortunately, you need the *old* Packages and the 
diff to work out what changed; the *new* Packages file and the diff are 
insufficient as the ed-style diff is not reversible like a normal patch is.

A while back I wrote two pieces of code that could be helpful here:

* lspdiff:
Run as
    lspdiff --packages Packages.old --pdiff 2012-07-10-2042.36.pdiff
you get a list of package names that have changed in some way (added, 
deleted or changed). You need to run it for each pdiff file in the sequence 
from the oldest Packages you have through to the current Packages, creating 
the intermediate Packages files with patch --ed for each step.

* deb822diff
Run as 
    deb822diff Packages.old Packages
you get a list of package names that have changed in some way (added, 
deleted or changed). You only need to run it once for the "old" and "new" 
Packages files. [This is actually a wrapper around a python module and it's 
trivial to have it work on Packages.gz or Packages.bz2 or ... I wonder if 
module would be useful for python-debian one day]

Of the two, I suspect that the latter is easier to work with. There is 
clearly a little scripting work to do to make sure that you keep the old 
Packages from the last full or incremental import around somewhere. Given 
the list of changed packages, deleting those rows from the db and then 
importing the updated rows would be best approach. You could update 
timestamps on non-changed entries at the same time if you wanted.

Either of these bits of code need some polish to make them useful to you 
(and to UDD which is what I originally wrote them for) and I'm quite happy 
to help with that -- feel free to contact me directly so we don't fill the 
list with noise about this.


- -- 
Stuart Prescott    http://www.nanonanonano.net/   stuart@nanonanonano.net
Debian Developer   http://www.debian.org/         stuart@debian.org
GPG fingerprint    BE65 FD1E F4EA 08F3 23D4 3C6D 9FE8 B8CD 71C5 D1A8

Version: GnuPG v1.4.10 (GNU/Linux)


Reply to: