On Wed, Aug 20, 2003 at 10:04:07PM +0100, Daniel Silverstone wrote:
> Let's have a look...
> *hauls out...
> (c) Introduction to algorithms, Cormen, Leiserson & Rivest
> and opens them up*
> (c) Page 228 "The division method"
> When using the division method, we usually avoid certain values of m.
> For example, m should not be a power of 2...
> ...good values for m are primes, not too close to exact powers of 2.
Reading further:
---
"The multiplication method"
The /multiplication method/ for creating hash functions operates in two
steps. First, we multiply the key /k/ by a constant /A/ in the range 0 < A
< 1 and extract the fractional part of /kA/. Then, we multiply this value
by /m/ and take the floor of the result. In short, the hash function is:
h(k) = |_ m(kA mod 1) _|
where "hA mod 1" means the fractional part of kA, that is kA - |_ kA _|.
An advantage of the multiplication method is that the value of /m/ is not
critical. We typically choose it to be a power of 2 -- m = 2^p for some integer
/p/ -- since we can then easily implement the function on most computers as
follows. [...]
---
Apparently Knuth suggests A = (sqrt(5) - 1) / 2 ~= 0.6180339887...,
so if the word size is w (32 bits, eg), then 0 <= k < 2^w, then
s * 2^w + r = k |_ A * 2^w _|
(note 0 <= s,r < 2^w), and the hash value is the p most significant
bits of r.
For comparison, britney uses:
long strhash(const char *x, unsigned char pow) {
unsigned long i = 0;
while (*x) {
i = (i * 39 + *x) % (1UL << pow);
x++;
}
return i;
}
to generate an index into an array of 2^pow elements from package
names. It works pretty well. pow is generally 14 (ie, a hash size
of 16,384).
Cheers,
aj
--
Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/>
I don't speak for anyone save myself. GPG signed mail preferred.
``Is this some kind of psych test?
Am I getting paid for this?''
Attachment:
pgpWS4iIkZYwy.pgp
Description: PGP signature