[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian's problems, Debian's future

On Wed, 2002-04-10 at 02:28, Anthony Towns wrote: 
> I think you'll find you're also unfairly weighting this against people
> who do daily updates. If you do an update once a month, it's not as much
> of a bother waiting a while to download the Packages files -- you're
> going to have to wait _much_ longer to download the packages themselves.
> I'd suggest your formula would be better off being:
> 	bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x )
> (If you update every day for a month, your cost isn't just one download,
> it's 30 downloads. If you update once a week for a month, your cost
> isn't that of a single download, it's four times that. The /x takes that
> into account)

I think it depends on what you're measuring.  I can think of two ways to
measure the "goodness" of these schemes (there are certainly others): 

1. What is the average bandwidth required at the server? 
2. What is the average bandwidth required at the client? 

The two questions are related: If users update after i days with
prob1(i), then the probability that a connection arriving at a server is
from a user updating after i days is 


where norm is a normalization factor so the probabilities sum to 1. 
I've been looking at question 2, and you're suggesting that I look at
question 1, except you forgot the normalization factor.  I think this is
what you mean.  Please correct me if I've misunderstood. 

Anyway, here are the results you asked for.  I'm NOT including the
normalization factor for easier comparison with your numbers.  My diff
numbers are a little different from yours mainly because I charge 1K of
overhead for each file request. 

Diff scheme 
days	dspace		ebwidth
1	12.000K		342.00K
2	24.000K		171.20K
3	36.000K		95.900K
4	48.000K		58.500K
5	60.000K		38.800K
6	72.000K		27.900K
7	84.000K		21.800K
8	96.000K		18.200K
9	108.00K		16.100K
10	120.00K		14.900K
11	132.00K		14.100K
12	144.00K		13.700K
13	156.00K		13.400K
14	168.00K		13.300K
15	180.00K		13.100K

Checksum file scheme with 4 byte checksums:
bsize	dspace		ebwidth
20	312.50K		173.70K
40	156.30K		89.300K
60	104.20K		62.200K
80	78.100K		49.300K
100	62.500K		42.200K
120	52.100K		37.900K
140	44.600K		35.300K
160	39.100K		33.600K
180	34.700K		32.700K
200	31.300K		32.200K
220	28.400K		32.100K
240	26.000K		32.200K
260	24.000K		32.500K
280	22.300K		33.000K
300	20.800K		33.600K
320	19.500K		34.300K
340	18.400K		35.100K
360	17.400K		35.900K
380	16.400K		36.800K
400	15.600K		37.700K

I'm probably underestimating the bandwidth of the checksum file scheme. 
I'm pretty confident about the diff scheme estimates, though.

I think the performance of the two schemes is pretty close.  Even though
this looks pretty good for the checksum file scheme, I'm still partial
to the diff scheme because 

- The checksum file scheme bottoms out at 32K, but the diff scheme can
reduce transfers to 13K (using more disk space).

- I trust my estimates of the diff scheme more.  The rsync scheme will
definitely take more bandwidth than my estimates predict.

- As debian gets larger, the checksum files will get larger, and so the
bandwidth will get larger.  So over time, any advantage of the checksum
file scheme will disappear.

- The diff scheme is more flexible and easier to tune.  The checksum
file scheme has a "sweet spot" at 220 byt blocks.  Predicting the actual
value of this sweet spot may be hard in the real world.


To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: