Re: Debian's problems, Debian's future
On Wed, 2002-04-10 at 02:28, Anthony Towns wrote:
> I think you'll find you're also unfairly weighting this against people
> who do daily updates. If you do an update once a month, it's not as much
> of a bother waiting a while to download the Packages files -- you're
> going to have to wait _much_ longer to download the packages themselves.
>
> I'd suggest your formula would be better off being:
>
> bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x )
>
> (If you update every day for a month, your cost isn't just one download,
> it's 30 downloads. If you update once a week for a month, your cost
> isn't that of a single download, it's four times that. The /x takes that
> into account)
I think it depends on what you're measuring. I can think of two ways to
measure the "goodness" of these schemes (there are certainly others):
1. What is the average bandwidth required at the server?
2. What is the average bandwidth required at the client?
The two questions are related: If users update after i days with
prob1(i), then the probability that a connection arriving at a server is
from a user updating after i days is
prob2(i)=(prob1(i)/i)*norm,
where norm is a normalization factor so the probabilities sum to 1.
I've been looking at question 2, and you're suggesting that I look at
question 1, except you forgot the normalization factor. I think this is
what you mean. Please correct me if I've misunderstood.
Anyway, here are the results you asked for. I'm NOT including the
normalization factor for easier comparison with your numbers. My diff
numbers are a little different from yours mainly because I charge 1K of
overhead for each file request.
Diff scheme
days dspace ebwidth
-------------------------------
1 12.000K 342.00K
2 24.000K 171.20K
3 36.000K 95.900K
4 48.000K 58.500K
5 60.000K 38.800K
6 72.000K 27.900K
7 84.000K 21.800K
8 96.000K 18.200K
9 108.00K 16.100K
10 120.00K 14.900K
11 132.00K 14.100K
12 144.00K 13.700K
13 156.00K 13.400K
14 168.00K 13.300K
15 180.00K 13.100K
Checksum file scheme with 4 byte checksums:
bsize dspace ebwidth
-------------------------------
20 312.50K 173.70K
40 156.30K 89.300K
60 104.20K 62.200K
80 78.100K 49.300K
100 62.500K 42.200K
120 52.100K 37.900K
140 44.600K 35.300K
160 39.100K 33.600K
180 34.700K 32.700K
200 31.300K 32.200K
220 28.400K 32.100K
240 26.000K 32.200K
260 24.000K 32.500K
280 22.300K 33.000K
300 20.800K 33.600K
320 19.500K 34.300K
340 18.400K 35.100K
360 17.400K 35.900K
380 16.400K 36.800K
400 15.600K 37.700K
I'm probably underestimating the bandwidth of the checksum file scheme.
I'm pretty confident about the diff scheme estimates, though.
I think the performance of the two schemes is pretty close. Even though
this looks pretty good for the checksum file scheme, I'm still partial
to the diff scheme because
- The checksum file scheme bottoms out at 32K, but the diff scheme can
reduce transfers to 13K (using more disk space).
- I trust my estimates of the diff scheme more. The rsync scheme will
definitely take more bandwidth than my estimates predict.
- As debian gets larger, the checksum files will get larger, and so the
bandwidth will get larger. So over time, any advantage of the checksum
file scheme will disappear.
- The diff scheme is more flexible and easier to tune. The checksum
file scheme has a "sweet spot" at 220 byt blocks. Predicting the actual
value of this sweet spot may be hard in the real world.
Best,
Rob
--
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: