# Bug#206416: dpkg package hash table insufficient

```On Wed, Aug 20, 2003 at 10:04:07PM +0100, Daniel Silverstone wrote:
> Let's have a look...
> *hauls out...
>  (c) Introduction to algorithms, Cormen, Leiserson & Rivest
> and opens them up*
> (c) Page 228 "The division method"
>   When using the division method, we usually avoid certain values of m.
>   For example, m should not be a power of 2...
>   ...good values for m are primes, not too close to exact powers of 2.

---

"The multiplication method"

The /multiplication method/ for creating hash functions operates in two
steps. First, we multiply the key /k/ by a constant /A/ in the range 0 < A
< 1 and extract the fractional part of /kA/. Then, we multiply this value
by /m/ and take the floor of the result. In short, the hash function is:

h(k) = |_ m(kA mod 1) _|

where "hA mod 1" means the fractional part of kA, that is kA - |_ kA _|.

An advantage of the multiplication method is that the value of /m/ is not
critical. We typically choose it to be a power of 2 -- m = 2^p for some integer
/p/ -- since we can then easily implement the function on most computers as
follows. [...]

---

Apparently Knuth suggests A = (sqrt(5) - 1) / 2 ~= 0.6180339887...,
so if the word size is w (32 bits, eg), then 0 <= k < 2^w, then

s * 2^w + r = k |_ A * 2^w _|

(note 0 <= s,r < 2^w), and the hash value is the p most significant
bits of r.

For comparison, britney uses:

long strhash(const char *x, unsigned char pow) {
unsigned long i = 0;
while (*x) {
i = (i * 39 + *x) % (1UL << pow);
x++;
}
return i;
}

to generate an index into an array of 2^pow elements from package
names. It works pretty well. pow is generally 14 (ie, a hash size
of 16,384).

Cheers,
aj

--
Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/>
I don't speak for anyone save myself. GPG signed mail preferred.

``Is this some kind of psych test?
Am I getting paid for this?''
```

Attachment: pgpmqErt6P1Ts.pgp
Description: PGP signature