Re: Apt & rsync
On Sat, 16 Oct 1999, Jason Gunthorpe wrote:
> [ Discussion of using rsync for Debian package distribution ]
>
> Let me be more clear, we will never get mirrors to run anon-rsyncd with a
> decent user limit because rsyncd takes up crazy amounts of cpu/memory.
Ah, that's a pity. Any chance of improving this? I suppose it's
inherently going to work harder than an ftp daemon, but, for instance, it
shouldn't be hard to modify rsync to cache the checksums. (The strong
file checksums are easiest, but the block checksums should also be
feasible.) This would be a big win, to only run the checksums once rather
than once per client.
I haven't used rsync, so I don't know about the feasibility of this. I've
CC:ed the rsync list in case they have comments.
> On Fri, 15 Oct 1999, Dylan Thurston wrote:
> > This is quite true, but raises the obvious question: why not change gzip
> > so that it doesn't scramble the contents so badly? This would have a
> > slight cost in compression percentage, but bandwidth gains should more
> > than make up for it. Andrew Tridgell addresses the issue in his original
>
> Although interesting, it seems to me that this pre-assumes that you have
> access to the previous uncompressed version in order to get the
> 'pre-deteremined' hash value.
You need access to the previous _compressed_ version, yes. So people
tracking unstable would probably want to do this, while people sticking
with stable distributions probably wouldn't. It's a disk space/bandwidth
tradeoff.
> Otherwise this compression algorithm wouldn't be terribly bad to have
> aroud, we already use rsync for mirroring, but we mirror about 100 meg of
> .gz files each day beacuse of this problem :<
Does it really make sense to use rsync here? According to this message
http://samba.anu.edu.au/listproc/rsync/1214.html from the rsync mailing
list, rsync is optimized for low-bandwidth connections, which these sites
obviously are not. Apparently the md5 checksums start to dominate the
transmission time.
> > Is there interest in this? Is it a good idea? My weekends are a bit busy
> > just now, but it sounds like a fun project.
>
> What I would like to see is an abuse of HTTP that would allow a mod-rsync
> to be written for apache. It would have exactly two functions, send a set
> of checksums for a file and send the given set of fragments. Unlike rsyncd
> the server would then be very light weight and everything complicated and
> CPU/IO intensive could be implemented client side. A mechanism to cache
> checksums could even be implemented...
Can you say what the advantages of using HTTP rather than the rsync
protocol would be?
> Something I have been meaning to write is an rsync-like program that uses
> a detached precomputed checksum file. It would operate like the
> Pseudo-image kit, but instead of operating blindly on a mirror it would
> reconstruct the initial pass exactly using the checksum information and
> then directly move to httping the missing portions... Right now using the
> Pseudo-image kit and rsync is extremely hard on both the server and the
> client :<
What's the pseudo-image kit?
> Jason
--Dylan Thurston
dpt@math.berkeley.edu
Reply to: