On Wed, Aug 20, 2003 at 10:04:07PM +0100, Daniel Silverstone wrote: > Let's have a look... > *hauls out... > (c) Introduction to algorithms, Cormen, Leiserson & Rivest > and opens them up* > (c) Page 228 "The division method" > When using the division method, we usually avoid certain values of m. > For example, m should not be a power of 2... > ...good values for m are primes, not too close to exact powers of 2. Reading further: --- "The multiplication method" The /multiplication method/ for creating hash functions operates in two steps. First, we multiply the key /k/ by a constant /A/ in the range 0 < A < 1 and extract the fractional part of /kA/. Then, we multiply this value by /m/ and take the floor of the result. In short, the hash function is: h(k) = |_ m(kA mod 1) _| where "hA mod 1" means the fractional part of kA, that is kA - |_ kA _|. An advantage of the multiplication method is that the value of /m/ is not critical. We typically choose it to be a power of 2 -- m = 2^p for some integer /p/ -- since we can then easily implement the function on most computers as follows. [...] --- Apparently Knuth suggests A = (sqrt(5) - 1) / 2 ~= 0.6180339887..., so if the word size is w (32 bits, eg), then 0 <= k < 2^w, then s * 2^w + r = k |_ A * 2^w _| (note 0 <= s,r < 2^w), and the hash value is the p most significant bits of r. For comparison, britney uses: long strhash(const char *x, unsigned char pow) { unsigned long i = 0; while (*x) { i = (i * 39 + *x) % (1UL << pow); x++; } return i; } to generate an index into an array of 2^pow elements from package names. It works pretty well. pow is generally 14 (ie, a hash size of 16,384). Cheers, aj -- Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/> I don't speak for anyone save myself. GPG signed mail preferred. ``Is this some kind of psych test? Am I getting paid for this?''
Attachment:
pgpWS4iIkZYwy.pgp
Description: PGP signature