Re: package pool and big Packages.gz file

To: Jason Gunthorpe <jgg@debian.org>
Cc: Goswin Brederlow <goswin.brederlow@student.uni-tuebingen.de>, Sam Vilain <sam@vilain.net>, debian-devel@lists.debian.org
Subject: Re: package pool and big Packages.gz file
From: Goswin Brederlow <goswin.brederlow@student.uni-tuebingen.de>
Date: 08 Jan 2001 04:28:47 +0100
Message-id: <[🔎] 873deusowg.fsf@mose.informatik.uni-tuebingen.de>
In-reply-to: Jason Gunthorpe's message of "Sun, 7 Jan 2001 17:34:38 -0700 (MST)"
References: <[🔎] Pine.LNX.3.96.1010107172635.21865M-100000@wakko.deltatee.com>

>>>>> " " == Jason Gunthorpe <jgg@debian.org> writes:

     > On 7 Jan 2001, Goswin Brederlow wrote:

    >> Actually the load should drop, providing the following feature
    >> add ons:
    >> 
    >> 1. cached checksums and pulling instead of pushing 2. client
    >> side unpackging of compressed streams

     > Apparently reversing the direction of rsync infringes on a
     > patent.

When I rsync a file, rsync starts ssh to connect to the remote host
and starts rsync there in the reverse mode.

You say that the recieving end is violating a patent and the sending
end not?

Hmm, which patent anyway?

So I have to fork a rsync-non-US because of a patent?

     > Plus there is the simple matter that the file listing and file
     > download features cannot be seperated. Doing a listing of all
     > files on our site is non-trivial.

I don't need to get a filelisting, apt-get tells me the name. :)
Also I can do "rsync -v host::dir" and parse the output to grab the
actual files with another rsync. So filelisting and downloading is
absolutely seperable.

Doing a listing of all file probably results in a timeout. The
harddrives are too slow.

     > Once you strip all that out you have rproxy.

     > Reversed checksums (with a detached checksum file) is something
     > someone should implement for debian-cd. You calud even quite
     > reasonably do that totally using HTTP and not run the risk of
     > rsync load at all.

At the moment the client calculates one roling checksum and md5sum per
block.

The server, on the other hand, calculates the rolling checksum per
byte and for each hit it calculates an md5sum for one block.

Given a 650MB file, I don't want to know the hit/miss ratios for the
roling checksum and the md5sum. Must be realy bad.

The smaller the file, the less wrong md5sums need to be calculated.

     > Such a system for Package files would also be acceptable I
     > think.

For Packages file even cvs -z9 would be fine. They are comparatively
small to the rest of the load I would think.

But I, just as you do, think that it would be a realy good idea to
have precalculated rolling checksums and md5sums, maybe even for
various blocksizes, and let the client do the time consuming guessing
and calculating. That would prevent rsync to read every file served
twice, as it does now when they are dissimilar.

May the Source be with you.
                        Goswin

Reply to:

Follow-Ups:
- Re: package pool and big Packages.gz file
  - From: Jason Gunthorpe <jgg@debian.org>

References:
- Re: package pool and big Packages.gz file
  - From: Jason Gunthorpe <jgg@debian.org>

Prev by Date: Re: Solving the compression dilema when rsync-ing Debian versions
Next by Date: Re: Solving the compression dilema when rsync-ing Debian versions
Previous by thread: Re: package pool and big Packages.gz file
Next by thread: Re: package pool and big Packages.gz file
Index(es):
- Date
- Thread