[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: slowing down point releases

[I'm redirecting this thread to debian-devel]

> Have you looked at rsync ?  It does something prety close to this.

I use rsync on my network - it's a pretty cool program.  But what I'm
working on is much simpler.

What I started to work on doesn't involve doing binary diffs.  
Basically, all I do is check out what files are in each package, and 
identify which ones are different.

For example, if I already have gimp-data_0.99.10-1.deb on my side
of my 28.8k connection to the Internet, and I want 
gimp-data_0.99.10-2.deb (just uploaded to Incoming) - why download 
the entire thing if most of the files haven't changed?

File Sizes:
gimp-data_0.99.10-1.deb    1443436
gimp-data_0.99.10-2.deb    1443436  (same size!)

So what's different?

- timestamps on all the files
- control file 

That's all.  If we could generate a tiny "Debian diff" file, which
could be applied against gimp-data_0.99.10-1.deb to turn it into
gimp-data_0.99.10-2.deb - then I'd save 15 minutes of downloading
time.  And that's just for one package, imagine the bandwidth
savings for mirroring the entire distribution.

The bandwidth savings aren't always this dramatic.  If the files 
changed - the Debian diff would just include the new one (not a 
binary diff).

How the protocol would work is pretty simple.  The client would
send a description of the guts of the .deb file it has to the server,
which would compare it against the description of the guts of the
.deb file it has - and send back a "Debian diff" file with the
changes.  The client would apply the "Debian diff" file - and voila,
the package is updated.

Because the server must have a little bit of smarts for this to work,
it's pretty important that it doesn't have to be constantly unzipping
the .deb files just to compare them.  Fortunately, the 'description
of the guts of the .deb file' needed is pretty similar to what 
Klee generates as "packages certificates" with dpkg-cert (md5sums and
such) -- and I think he's going to build it into the .deb file format.
The server doesn't have to do too much processing - so I think it
might be practical to use this approach on a public server, whereas
the rsync approach might be too 'heavy' in terms of processing.

It would even be possible to use something like dpkg-repack to build
a package based on the files that are already installed -- generate
a certificate, and get the "Debian diff" back to generate a valid
updated package!  This might be a bit heavy on processing on the
client side - but might be faster than downloading over a slow link.

The extremely simple protocol would basically be just an HTTP 'put'
request to a CGI script running on a webserver.  That way, anybody behind 
a firewall would still be able to access public servers on the internet.  
That would be better than rsync, which is dependent on rsh or ssh.

I'd like to implement this as soon as possible. But I'm entertaining
company this long weekend - then I have to work at a client's place in
Vancouver next week.  So don't hold your breath.  If someone else wants
to steal the idea -- feel free to do it.

(I've still got dwww to do + the Debian developer database thingy too)


 - Jim

Attachment: pgpZLyRwmsYEI.pgp
Description: PGP signature

Reply to: