Re: Apt & rsync
On Sat, 16 Oct 1999, Jason Gunthorpe wrote:
> On Sat, 16 Oct 1999, Dylan Thurston wrote:
> > I'm not sure we're talking about the same thing here. Let me summarize
> > the algorithm as I currently see it:
> >
> > a) Debian developer creates the .deb file, using a modified rsync-friendly
> > gzip. (Let's call it 'rzip' for now.)
>
> ** This step needs to have the 'last' .deb to pass to rzip to allow it to
> fragment it properly.
Here I disagree. The suggestion Andrew made in his thesis is to
essentially cut the file up into pieces at predictable points and compress
each piece seperately. This is what I meant by "rsync-friendly"; it's
reproducible without needing the old .debs. I looked at the gzip source
code last night, and it apparently cuts the file up anyway; it just needs
to be modified so that the cut-points are predictable. A big benefit: the
file can then be uncompressed using an unmodified gzip.
To find the `predictable points' to cut the file at, Andrew suggested
cutting when a running hash (e.g., the one used in rsync) takes some
predetermined value.
This means that sufficiently long unchanged portions will compress to
large the same thing, regardless of context and without knowing the
previous compressed file.
I have no idea how much this would hurt the compression.
> > b) Developer uploads the .deb to incoming using ftp.
> > c) Archive is distributed to the mirrors using the current rsync setup,
> > which would run faster because of the friendly compression.
>
> This isn't true because rsync can't handle renaming a file. A change of
> contents always involves a rename so without rsync changes to be aware of
> our archive structure there is no win.
OK. But this is feasible.
Stupid hack: would rsync behave right if there were also a hard (or soft)
link someplace with a file name that was just the package name?
> As Gary suggested, rsyncing over the ungziped .deb is probably ultimately
> better, but complicated and expensive server+client side.
It's a feasible alternative, yes. You mentioned the problems: extra load
server side; need to implement some compression on top of rsync;
comparable load client side; potential problems signing files. (I don't
see extra client load beyond the other proposed solutions.) The only
serious one I can see is the extra server load. I wonder how bad it would
be; it doesn't seem like it would be much over running the rsync algorithm
in the first place.
--Dylan Thurston
dpt@math.berkeley.edu
Reply to: