[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#128818: [patch] packages.gz diff support for apt



Michael wrote:
> The code will download until it finds a empty patch, it assumes then
> that the index is now up-to-date and stops. If it does not find a
> patch it will auto-fallback to Packages.bz2 and then to
> Packages.gz. The code is diffed against the arch repository at:
> http://people.debian.org/~mdz/arch/apt@packages.debian.org
> (apt@packages.debian.org/apt--main--0)

FWIW, what I was considering last I looked at this (Dec 2003 apparently...) was a combination of an index file and gzipped --ed diffs. The index file gives you a bit more control over your patches, and some redundancy so you can check if you've gotten everything screwed up; --ed style diffs happen to kick ass for this problem.

So the index file I was imagining looked like:

Canonical-Name: netstat.txt
MD5-History:
 e43a9e356e65b79bde65ea8794594d9b    1934 2003-12-01-1259.10
 68ad5015da0dbd75b83ae03ea68c0fbd    1934 2003-12-01-1259.44
 51d331edc38ba522a1c95002e6ee91c9    2096 2003-12-01-1300.19
 11d476ccadd18072dfbb6d6907274b8b    1853 2003-12-01-1300.53
 ae1076d482f0376ee86c2ee9c6342fd4    1691 2003-12-01-1301.27
 756f08019c209eea1cc1fe0497ebd2f7    1691 2003-12-01-1302.01
MD5-Patches:
 1968c0ddf9761d0e6a8b1fa8766b32c8     882 2003-12-01-1259.10
 f3d6619a17d3065dee83bb3b6e328453     797 2003-12-01-1259.44
 5f2791687760176a6d243f4da0f6757b     468 2003-12-01-1300.19
 fc22e01fc575d8f9dbb4d4cd1ef1fb2d     468 2003-12-01-1300.53
 b75d4c0b33d2a76284ed86c395a60192     461 2003-12-01-1301.27
 b4e2aa24bda367acc9f83740840e5bc1     461 2003-12-01-1302.01

The History section tells you what the original file you're patching from was, and the Patches section lets you validate the patch you're about to apply. Knowing the md5sum/size of the original file is obviously crucial, since that's how you know what patch to apply. Knowing the md5sum/size of what you're going to end up with is a useful sanity check, so that you can stop halfway through if you've somehow managed to get yourself into a loop or similar. Knowing the md5sum of the patches is useful just in case diff has a root exploit. Knowing the size of the patches you need to download is good for progress bars. Knowing the date of the resulting Packages file you're going to create at each step is useful for debugging -- while you might expect daily patches for testing/unstable, they'll come at much more irregular intervals for stable or security updates.

The attached "update.py" is a python script that when invoked as:

	./update.py Index file file.prev

will generate an --ed style diff and update the Index in the format listed above. It'll also limit the number of patches to 14, deleting any that are too far out of date.

The above example was generated by something like:

	while : ; do
		cat netstat.txt > orignetstat.txt
		netstat > netstat.txt
		./update.py index.txt netstat.txt orignetstat.txt
		sleep 30
	done

Cheers,
aj
#!/usr/bin/env python

import datetime, sys, os
import apt_pkg

class Updates:
    def __init__(self, readme = None):
        self.can_name = None
        self.history = {}
	self.max = 14

        if readme:
            f = open(readme)
            x = f.readline()

            def read_md5s(ind, x=x):
                while 1:
                    x = f.readline()
                    if not x or x[0] != " ": break
                    l = x.split()
                    if not self.history.has_key(l[2]):
                        self.history[l[2]] = [None,None]
                    self.history[l[2]][ind] = (l[0], int(l[1]))
                return x

            while x:
                l = x.split()

                if len(l) == 0:
                    x = f.readline()
                    continue

                if l[0] == "Canonical-Name:":
                    self.can_name = l[1]
                    x = f.readline()
                    continue

                if l[0] == "MD5-History:":
                    x = read_md5s(0)
                    continue

                if l[0] == "MD5-Patches:":
                    x = read_md5s(1)
                    continue

                x = f.readline()

    def dump(self, out=sys.stdout):
        out.write("Canonical-Name: %s\n" % (self.can_name))
	hs = self.history
        l = self.history.keys()
        l.sort()

	cnt = len(l)
	if cnt > self.max:
		for h in l[:cnt-self.max]:
			os.unlink("%s.diff" % (h))
			del hs[h]
		l = l[cnt-self.max:]

	out.write("MD5-History:\n")
        for h in l:
            out.write(" %s %7d %s\n" % (hs[h][0][0], hs[h][0][1], h))
	out.write("MD5-Patches:\n")
        for h in l:
            out.write(" %s %7d %s\n" % (hs[h][1][0], hs[h][1][1], h))
	

format = "%Y-%m-%d-%H%M.%S"
now = datetime.datetime.utcnow().strftime(format)
(outfile, newfile, oldfile) = sys.argv[1:4]

tmpfile = oldfile + ".tmp"
difffile = now + ".diff"

upd = Updates(outfile)

os.link(newfile, tmpfile)

def sizemd5(fn):
	size = os.stat(fn)[6]
	f = open(fn)
	md5sum = apt_pkg.md5sum(f)
	f.close()
	return (md5sum, size)

oldsizemd5 = sizemd5(oldfile)
newsizemd5 = sizemd5(tmpfile)

if newsizemd5 == oldsizemd5:
	os.unlink(tmpfile)
else:
	os.system("diff --ed %s %s > %s" % (oldfile, tmpfile, difffile))
	difsizemd5 = sizemd5(difffile)

	upd.history[now] = (oldsizemd5, difsizemd5)

	os.rename(tmpfile, oldfile)

	f = open(outfile, "w")
	upd.dump(f)
	f.close()

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: