[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFDisscusion: Big Packages.gz and Statistics and Comparing solution

>>>>> " " == zhaoway  <zhaoway@public1.ptt.js.cn> writes:

     > Hi, [Sorry for the thread broken, my POP3 provider stopped.]
     > [Please Cc: me! <zhaoway@public1.ptt.js.cn>. Sorry! ;-)]

     > 1. RFDiscussion on big Packages.gz

     > 1.1. Some statistics

     > % grep-dctrl -P
     > -sPackage,Priority,Installed-Size,Version,Depends,Provides,Conflicts,Filename,Size,MD5sum
     > -r '.*'
     > ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages
     > | gzip -9 > test.pkg.gz % gzip -9
     > ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages
     > % ls -alF *.gz -rw-r--r-- 1 zw zw 1157494 Jan 7 21:20
     > ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages.gz
     > -rw-r--r-- 1 zw zw 341407 Jan 7 21:23 test.pkg.gz %

Ahh, what does it do? Just take out the descriptions?

     > This approach is simple and straight and almost compatible. But
     > could accpect 10K more packages come into Debian with little
     > loss. Worth consideration. IMHO.

     > Better, if `Description:' etc. could come into seperate gzipped
     > file along with the Debian package.

The problem is that people want to browse descriptions to find a
package fairly often or just run "apt-cache show package" to see what
a package is about. So you need a method to download all descriptions.

Also many small files compress far less than one big file.

     > 2. Compare with DIFF and RSYNC method of APT

     > 2.1. They need server support. (More than a directory layout
     > and client tool changing.)

As far as I see theres no server support needed for rsync support to
operate better on compressed files.

     > 2.2. If you don't update for a long time, DIFF won't
     > help. RSYNC help less.

If you update often, saving 1 Byte every time is worth it. If you
update seldomely, it doesn't realy matter that you download a big
Packages.gz. You would have to downlaod all the small Packages.gz
files also.

And after that you download 500 MB of updates. So who cares about 2MB

Also, diff and rsync do a great job even after a long time:

diff potato_Packages woody_Packages| gzip -9 | wc --bytes

% ls -l /debian/dists/woody/main/binary-i386/Packages.gz
-rw-r--r--    1 mrvn     mrvn       955259 Jan  6 21:03 /debian/dists/woody/main/binary-i386/Packages.gz

So you see, between potato and woody diff saves about 60%.
Also note that rsync usually performs better than cvs, since it does
not include the to be removed lines in the download.

     > 3. Additional benefits

     > Seperate changelog.Debian and `Description:' etc. out into
     > meta-info file could help users: 1) reduce the bandwidth eaten
     > 2) help their upgrade decisions easily.

A global Description.gz might benefit from the fact that the
description doesn't change for each update, but the extra work needed
for this to realy work is not worth it. It would only benefit people
that do daily mirroring, where rsync would do just as good.


Reply to: