Re: Hash seed breaks 5.8.1 binary API; fix suggested

To: Chip Salzenberg <chip@pobox.com>
Cc: Perl 5 Porters <perl5-porters@perl.org>, debian-perl@lists.debian.org
Subject: Re: Hash seed breaks 5.8.1 binary API; fix suggested
From: Nicholas Clark <nick@ccl4.org>
Date: Sat, 11 Oct 2003 10:29:07 +0100
Message-id: <[🔎] 20031011092906.GV14030@plum.flirble.org>
In-reply-to: <[🔎] 20031010024134.GE2004@perlsupport.com>
References: <[🔎] 20031010024134.GE2004@perlsupport.com>

On Thu, Oct 09, 2003 at 10:41:34PM -0400, Chip Salzenberg wrote:

>   3. Add (or xor or whatever) PL_new_hash_seed with the user-provided
>      hash value *inside* the functions that use hashes, *not* outside
>      them.

I think that it has to be "whatever"

It can't be xor. The attack relies on pre-computing a set of values
which are a pathological case for the hashing algorithm, values which
all end up in one bucket. IIRC the mapping to buckets is done using the low
N bits of the 32 bit hash value (where N starts at 3 and increases as the
hash grows).

Given a set of N bit hash values which map to the same bucket, xoring them
all with a constant will still put them in the same bucket. All this would
do is change the order for keys on non-pathological hashes, without actually
splitting the set. Likewise add would just rotate the buckets round, as
effectively it would be modulo arithmetic in base N

I think that we have to do "whatever" where whatever is hashing the input
32 bit PERL_HASH value inside the hv.c functions, seeded by the per
run random value. I'm just not sure what the "whatever" is. We could use
5 bits of it and rotate the 32 bit hash value round. This makes the attack
harder, because then the attacker has to find a set of strings that hash
to the same 32 bit value, rather than just the same bucket. Then again,
if the attacker can do that, anything we do inside hv.c is fruitless, as
we're trying to reliably distinguish between identical values :-(

At which point we may as well calculate a new hash value inside hv.c each
time and kill performance in the general case.

If we don't do that, I believe that information we have to play with is

1: 32 bit PERL_HASH value
2: key length
3: per run 32 bit random seed

How can we generate a new value out of that at little cost?

Nicholas Clark

Reply to:

Follow-Ups:
- Plan B for fixing 5.8.2 binary API
  - From: Chip Salzenberg <chip@debian.org>

References:
- Hash seed breaks 5.8.1 binary API; fix suggested
  - From: Chip Salzenberg <chip@pobox.com>

Prev by Date: Re: Hash seed breaks 5.8.1 binary API; fix suggested
Next by Date: Plan B for fixing 5.8.2 binary API
Previous by thread: Re: Hash seed breaks 5.8.1 binary API; fix suggested
Next by thread: Plan B for fixing 5.8.2 binary API
Index(es):
- Date
- Thread