[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#206416: dpkg package hash table insufficient



X-Debbugs-Cc: dsilvers@debian.org

On Wed, 2003-08-20 at 23:24, Falk Hueffner wrote:

> Scott James Remnant <scott@netsplit.com> writes:
> 
> > Falk Hueffner <falk.hueffner@student.uni-tuebingen.de> wrote:
> > 
> > > Daniel Silverstone <dsilvers@digital-scurf.org> writes:
> > > 
> > > > *hauls out...
> > > >  (a) Practical Algorithms for Programmers, Binstock & Rex
> > > >  (b) The Art of Computer Programming Vol 3. "Sorting and Searching", Knuth
> > > >  (c) Introduction to algorithms, Cormen, Leiserson & Rivest
> > > 
> > > Those all fall into "hash functions that really suck". IMHO it should
> > > be intuitively obvious that your hash function sucks when it is
> > > affected by the bucket size being a power of two, since that means the
> > > input value affects upper bits more than lower bits. See also
> > > http://burtleburtle.net/bob/hash/doobs.html
> > 
> > All well and good, except the replacement FMV code is at least 20%
> > faster.
> 
> I don't know FMV, but if it is a reasonable hash, it shouldn't depend
> on prime table sizes. Maybe you could just also benchmark it with a
> power-of-two table size?
> 
FMV was a typo, FNV is the correct spelling of that abbreviation.  By
the way, the URL you provide specifically states that the author knows
FNV to be faster than their own hashing algorithm.

I've run some proper benchmarks against the two useful machines I had
available to me.

i686	Athlon 1.7Ghz 1GB RAM.  dpkg knows about 904 packages
alpha	250Mhz Alpha 64MB RAM.  dpkg knows about 493 packages

Each test was run 20 times, and the average time computed.  The three
hashing algorithms were

existing	The algorithm currently used by dpkg 1.10.10
FNV 8191	FNV algorithm with 8191 bins, Daniel's patch 
FNV 8192	as above, but with 8192 bins ("power of two")

                          existing        FNV 8191        FNV 8192
                        i686   alpha    i686   alpha    i686   alpha
                        ------------    ------------    ------------
dpkg -l                 2.999  5.672    1.850  4.871    1.913  4.998
dpkg -p dpkg            2.856  5.501    1.792  4.623    1.806  4.891
dpkg -p nonexistant     2.780  5.525    1.755  4.596    1.836  4.639

The results indicate the following improvements could be made by
changing the algorithm.

			i686			alpha
			----------------------	----------------------
existing -> FNV 8191	60% speed increase	19% speed increase
existing -> FNV 8192	55% speed increase	15% speed increase
FNV 8191 -> FNV 8192	3% speed DECREASE	3% speed DECREASE


Firstly this shows that the existing algorithm really needs to be
replaced.  The i686 is probably a fairly typical high-end user machine,
and it gains a 50%-60% speed increase from the new algorithm!

An "interesting hardware" machine like the alpha gains a 20% speed
increase, and as Vince discovered ARM hardware gets a phenomenal speed
boost!

Secondly this shows that changing the number of bins from a prime-1 to a
power of two DECREASES the speed of the algorithm by 3% on both i386 and
alpha.

Scott
-- 
Have you ever, ever felt like this?
Had strange things happen?  Are you going round the twist?

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: