[I'm redirecting this thread to debian-devel] > Have you looked at rsync ? It does something prety close to this. I use rsync on my network - it's a pretty cool program. But what I'm working on is much simpler. What I started to work on doesn't involve doing binary diffs. Basically, all I do is check out what files are in each package, and identify which ones are different. For example, if I already have gimp-data_0.99.10-1.deb on my side of my 28.8k connection to the Internet, and I want gimp-data_0.99.10-2.deb (just uploaded to Incoming) - why download the entire thing if most of the files haven't changed? File Sizes: gimp-data_0.99.10-1.deb 1443436 gimp-data_0.99.10-2.deb 1443436 (same size!) So what's different? - timestamps on all the files - control file That's all. If we could generate a tiny "Debian diff" file, which could be applied against gimp-data_0.99.10-1.deb to turn it into gimp-data_0.99.10-2.deb - then I'd save 15 minutes of downloading time. And that's just for one package, imagine the bandwidth savings for mirroring the entire distribution. The bandwidth savings aren't always this dramatic. If the files changed - the Debian diff would just include the new one (not a binary diff). How the protocol would work is pretty simple. The client would send a description of the guts of the .deb file it has to the server, which would compare it against the description of the guts of the .deb file it has - and send back a "Debian diff" file with the changes. The client would apply the "Debian diff" file - and voila, the package is updated. Because the server must have a little bit of smarts for this to work, it's pretty important that it doesn't have to be constantly unzipping the .deb files just to compare them. Fortunately, the 'description of the guts of the .deb file' needed is pretty similar to what Klee generates as "packages certificates" with dpkg-cert (md5sums and such) -- and I think he's going to build it into the .deb file format. The server doesn't have to do too much processing - so I think it might be practical to use this approach on a public server, whereas the rsync approach might be too 'heavy' in terms of processing. It would even be possible to use something like dpkg-repack to build a package based on the files that are already installed -- generate a certificate, and get the "Debian diff" back to generate a valid updated package! This might be a bit heavy on processing on the client side - but might be faster than downloading over a slow link. The extremely simple protocol would basically be just an HTTP 'put' request to a CGI script running on a webserver. That way, anybody behind a firewall would still be able to access public servers on the internet. That would be better than rsync, which is dependent on rsh or ssh. I'd like to implement this as soon as possible. But I'm entertaining company this long weekend - then I have to work at a client's place in Vancouver next week. So don't hold your breath. If someone else wants to steal the idea -- feel free to do it. (I've still got dwww to do + the Debian developer database thingy too) Cheers, - Jim
Attachment:
pgphjEB0cBpeb.pgp
Description: PGP signature