[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: package pool and big Packages.gz file



On 8 Jan 2001, Goswin Brederlow wrote:

> Then that feature should be limited to non-recursive listings or
> turned off. Or .listing files should be created that are just served.

*couf* rproxy *couf*

> So when you have more blocks, the hash will fill up. So you have more
> hits on the first level and need to search a linked list. With a block
> size of 1K a CD image has 10 items per hash entry, its 1000% full. The
> time wasted alone to check the rolling checksum must be huge.

Sure, but that is trivially solvable and is really a minor amount of
time when compared with the computing of the MD4 hashes. In fact when you
start taking about 650000 blocks you want to reconsider the design choices
that were made with rsync's searching - it is geared toward small files
and is not really optimal for big ones.

> So the better the match, the more blocks you have, the more cpu it
> takes. Of cause larger blocks take more time to compute a md4sum, but
> you will have less blocks then.

No. The smaller the blocks the more CPU time it will take to compute MD4
hashes. Expect MD4 to run at > 100meg/sec on modern hardware so you are
looking at burning 6 seconds of CPU time to verify the local CD image.

If you start getting large 32 bit checksum matches with md4 mismatches due
to too large a block size then you could easially double or triple the
number of md4 calculations you need. That is still totally dwarfed by the
< 10meg/sec IO throughput you can expect with a copy of a 600 meg ISO
file. 
 
Jason



Reply to: