[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Apt and rsync... I know...



On Sun 18 Jan 2004 2:08 am, Matt Zimmerman wrote:
> On Sun, Jan 18, 2004 at 01:43:30AM -0700, Doug Holland wrote:
> > That's it.  One piddly little patch, which most likely affected one line
> > of source code, required me to waste 2.5 hours of dialup bandwidth
> > downloading a .deb file that's almost identical to the one I downloaded
> > yesterday.  Am I the only one who finds this irritating?
>
> This all makes sense, except for the "required" part.
>
> > I remember the reason why apt is not currently doing rsync is because it
> > hogs I/O and CPU cycles on the Debian servers.  That's a valid reason,
> > but surely there are ways around it.
>
> Surely.  First, arrange for all of the packages in Debian to be compressed
> using gzip --rsyncable.  Then, find a server with gobs of disk, network,
> I/O and CPU resources, mirror the Debian archive, and run an anonymous
> rsync server.  Then, write an rsync method for apt, or use rproxy, or
> whatever.
>
> That is the approximate order in which things would need to happen.  It's a
> bit early in the process to be pointing fingers at apt.
>
> > I suggest that rsync files be precalculated, so rsync downloads don't
> > have to be crunched on the fly.  The servers would store .deb files -
> > foo-x.y.z.deb, and they would store the rsync diffs between it and the
> > previous version - foo.x.y.z-1_x.y.z.rsdeb.  That way, if the user doing
> > an apt-get upgrade has the previous .deb file in his cache, apt would
> > download the rsync diff file instead of the full .deb, saving loads of
> > bandwidth, and since the rsyncs are precomputed and cached, the servers
> > don't get hosed.
> >
> > Am I totally off base suggesting this?
>
> Yes.  It shows that you haven't read the previous discussions that you
> alluded to at the beginning of your message, because they explain why this
> is not a good solution.
>
> Bug #128818 has some starting points.
>
> --
>  - mdz

Yes, I have read the discussions.

The reason rsync would be such a disk and cpu hog for the servers is because 
it would be recomputing rsyncs over & over for every file requested, and I 
imagine that the latest updates are downloaded thousands of times a day.  
Rather than doing the same computations thousands of times, it would be 
easier to do it only once, then cache the results on the server as a 
foo.deb.rsync file.  I think it's doable without hosing the servers.

The catch:

In order to use the rsync file in place of the full .deb, the user has to have 
the previous version of the .deb file on his system.  If he's two versions 
behind, he has to download the .deb.  If he cleared apt's package cache, he's 
out of luck.

/me dons his asbestos undies...

Attachment: pgpUzC23vsTea.pgp
Description: signature


Reply to: