RFDisscusion: Big Packages.gz and Statistics and Comparing solution
[Sorry for the thread broken, my POP3 provider stopped.]
[Please Cc: me! <firstname.lastname@example.org>. Sorry! ;-)]
1. RFDiscussion on big Packages.gz
1.1. Some statistics
% grep-dctrl -P -sPackage,Priority,Installed-Size,Version,Depends,Provides,Conflicts,Filename,Size,MD5sum -r '.*' ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages | gzip -9 > test.pkg.gz
% gzip -9 ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages
% ls -alF *.gz
-rw-r--r-- 1 zw zw 1157494 Jan 7 21:20 ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages.gz
-rw-r--r-- 1 zw zw 341407 Jan 7 21:23 test.pkg.gz
This approach is simple and straight and almost compatible. But could
accpect 10K more packages come into Debian with little loss. Worth
Better, if `Description:' etc. could come into seperate gzipped file along
with the Debian package.
1.2. Little math
Suppose: 1) Site A get K hits of `apt-get update' per day. With everyday
passed, M extra hits added, as Debian goes more popular.
2) N new packages come into Debian every day. After `gzip -9',
each contribute 206 byte to old package index file, and 61 to
new format index file. Current package number is P.
3) Days passed as X axis.
4) B as the byte size of the data flow for `apt-get update' for
that day. On the server side. (Client side K =1, M = 0)
B = (K + M*X) * (P + N*X) * 206 is for old format package index
B = (K + M*X) * (P + N*X) * 61 is for new format package index
[It's still X^^2 function, anyway, so it's, in theory, not a big deal. ;-)]
[Only if we could eliminate the need for Package Index. That is possible. ]
For K = 500, P = 6000, X = 0, Server side B is,
zw@q ~/tmp % echo $((6000*500*206))
zw@q ~/tmp % echo $((6000*500*61))
zw@q ~/tmp %
[Though the caches could help a great lot for servers in such cases.]
2. Compare with DIFF and RSYNC method of APT
2.1. They need server support. (More than a directory layout and client tool
2.2. If you don't update for a long time, DIFF won't help. RSYNC help less.
3. Additional benefits
Seperate changelog.Debian and `Description:' etc. out into meta-info file
could help users: 1) reduce the bandwidth eaten 2) help their upgrade
echo <<EOF |cpp - -|egrep -v '(^#|^$)'
/* =|=X ++
* /\+_ p7 <email@example.com> */