X-Debbugs-Cc: dsilvers@debian.org On Wed, 2003-08-20 at 23:24, Falk Hueffner wrote: > Scott James Remnant <scott@netsplit.com> writes: > > > Falk Hueffner <falk.hueffner@student.uni-tuebingen.de> wrote: > > > > > Daniel Silverstone <dsilvers@digital-scurf.org> writes: > > > > > > > *hauls out... > > > > (a) Practical Algorithms for Programmers, Binstock & Rex > > > > (b) The Art of Computer Programming Vol 3. "Sorting and Searching", Knuth > > > > (c) Introduction to algorithms, Cormen, Leiserson & Rivest > > > > > > Those all fall into "hash functions that really suck". IMHO it should > > > be intuitively obvious that your hash function sucks when it is > > > affected by the bucket size being a power of two, since that means the > > > input value affects upper bits more than lower bits. See also > > > http://burtleburtle.net/bob/hash/doobs.html > > > > All well and good, except the replacement FMV code is at least 20% > > faster. > > I don't know FMV, but if it is a reasonable hash, it shouldn't depend > on prime table sizes. Maybe you could just also benchmark it with a > power-of-two table size? > FMV was a typo, FNV is the correct spelling of that abbreviation. By the way, the URL you provide specifically states that the author knows FNV to be faster than their own hashing algorithm. I've run some proper benchmarks against the two useful machines I had available to me. i686 Athlon 1.7Ghz 1GB RAM. dpkg knows about 904 packages alpha 250Mhz Alpha 64MB RAM. dpkg knows about 493 packages Each test was run 20 times, and the average time computed. The three hashing algorithms were existing The algorithm currently used by dpkg 1.10.10 FNV 8191 FNV algorithm with 8191 bins, Daniel's patch FNV 8192 as above, but with 8192 bins ("power of two") existing FNV 8191 FNV 8192 i686 alpha i686 alpha i686 alpha ------------ ------------ ------------ dpkg -l 2.999 5.672 1.850 4.871 1.913 4.998 dpkg -p dpkg 2.856 5.501 1.792 4.623 1.806 4.891 dpkg -p nonexistant 2.780 5.525 1.755 4.596 1.836 4.639 The results indicate the following improvements could be made by changing the algorithm. i686 alpha ---------------------- ---------------------- existing -> FNV 8191 60% speed increase 19% speed increase existing -> FNV 8192 55% speed increase 15% speed increase FNV 8191 -> FNV 8192 3% speed DECREASE 3% speed DECREASE Firstly this shows that the existing algorithm really needs to be replaced. The i686 is probably a fairly typical high-end user machine, and it gains a 50%-60% speed increase from the new algorithm! An "interesting hardware" machine like the alpha gains a 20% speed increase, and as Vince discovered ARM hardware gets a phenomenal speed boost! Secondly this shows that changing the number of bins from a prime-1 to a power of two DECREASES the speed of the algorithm by 3% on both i386 and alpha. Scott -- Have you ever, ever felt like this? Had strange things happen? Are you going round the twist?
Attachment:
signature.asc
Description: This is a digitally signed message part