[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: better RSYNC mirroring , for .debs and others



tom rothamel is working on a project called debdiff that works towards the
same goal. please read his announcment thread, which is archived at
http://www.debian.org/Lists-Archives/debian-devel-0002/msg00391.htm.

i like the idea of rsync modules, but the concept you project misses is that
even a small addition or subtraction in the beginning of a file ruins
rsync's speed bonus because it then has to send everything. take a look at
tom's code. i think you'll find it interesting.

Andrea Mennucc1 (debian@Tonelli.sns.it) wrote:
> 
> hi everybody
> 
> I have implemented
> a good idea for reducing download stress for everybody who is
> mirroring a lot of data using rsync, 
> like, the people who are mirroring Debian GNU/Linux:
> currently, many Debian "leaf mirrors" are using rsync 
> for mirroring from the main  .debian.org hosts.
> 
> rsync contains a wonderful algorithm to speedup downloads when mirroring
> files which have only minor differences;
> only problem is, this algorithm is ALMOST NEVER  used
> when mirroring a debian repository
> ... indeed, whenever a new version of a
> package is entered in the debianrepository,
> this package has a different name: for this reason rsync  does just a
> full download. 
> Summarizing, rsync currently does some speedup only
> when it downloads Packages.gz files, or when it skips an already existing
> package.
> 
> well, I have just implemented a simple
> way to use the algorithm even when downloading the .debs .
> 
> here is a simple example
> 
> suppose the current situation is
>     $REMOTE::/pub/debian/dist/bin/dpkg_2.deb
> whereas locally we have
>     /debian/dist/bin/dpkg_1.deb
> 
> when rsync looks for a local version of
>     /debian/dist/bin/dpkg_2.deb
> if there is none, then rsync does
>   ls -t     /debian/dist/bin/dpkg_*
> and looks for the most recent file it finds
> 
> this way, rsync will use the file     /debian/dist/bin/dpkg_1.deb
> to try to speedup the download of    $REMOTE::/pub/debian/dist/bin/dpkg_2.deb
> (using its fabulous algorithm)
> 
> BIG PRO: my new "rsync" is totally compatible with the old one
> 
> Conclusion:
> this idea would make all debian mirror-people  happier
> (specially if they mirror "unstable"; consider that, often,
> when a new version of a package is released, only small changes are made...
> sometimes, only the .postinst , or such, are really changed;
> this may , thou, masked by the compression, alas: but, see TODO)
> 
> I attach  two files: the first file is a diff, showing where, in
> the "rsync 2.4.1" source code tree, I have done some modifications;
> the second is a .tgz of the all the new and modified files you
> need to build the new rsync: 
> to build, first you need to download
> the source code (see rsync.samba.org/rsync/download.html)
> and then you unpack the file rsync.diffsrc.tgz in the tree code,
> and build.
> 
> You may also get the compiled binary directly as 
>  ftp://tonelli.sns.it/pub/rsync/rsync
> and the new code alltogether in
>  ftp://tonelli.sns.it/pub/rsync
> 
> TODO:
> there are some potentially good ideas here:
> 
> 1) the idea is to add "modules" to rsync: 
>   a "gzip" module, a "deb" module, and "rpm" module...;
>   currently, modules just look for an older local version of the file;
> 
>   in a future version,  any module would
>   apply to a certain type of file, and create
>   another file to pass to "rsync"
>   so that this another file  may probably lead to more speedup:  
>   e.g., the "gzip" module would unzip files before doing comparisons,
>   and the "deb" module would unzip the data.tar.gz part of a package
> 
>  CONS: this would not be backward compatible, of course
>   
>   The idea is, a module may provide  the following calls:
>    find_alternative_version_MOD()
>    receive_file_MOD()
>    send_file_MOD()
>    
>  Currently, only  find_alternative_version_deb() was implemented.
> 
>  If rsync uses only the find_alternative_version_MOD()
>  calls, then it is "backward compatible" with the usual version:
>  (in a sense , it is doing what the option  --compare-dest  already does,
>   only in a smarter way)
>  
>  I have not currently implemented any    receive_file_MOD()
>    send_file_MOD() : these would need a change in the protocol:
>    I hope that the rsync authors will give permission
> 
> 1b) My idea (not sure) is that "rsync" may work if provided with "named pipes"
>  instead of files: indeed, according to the technical report,
>  it needs to read the local and remote files only once, 
>   and then, it writes the local file, without ever seeking backwards;
>  then, the above modules would not need to actually
>  use disk space and create temporary files.
> 
> 
> 2) for a faster apt-get downloading,
>  it may be possible to do the same trick WHEN UPGRADING
>  INSTALLED PACKAGES!  Here is the idea:
>   "apt-get creates a local version of the package
>   (using dpkg-repack)
>   and do the rsync to get the remote version"
>  
> 
> 
> -- 
> Andrea C. Mennucci,   Scuola Normale Superiore, Pisa, Italy

-- 
(jacob kuntz)                    jpk@cape.com jake@{megabite,underworld}.net
(megabite systems)                       "think free speech, not free beer."


Reply to: