Plan C (was Re: Plan B for fixing 5.8.2 binary API)
On Mon, Oct 13, 2003 at 09:49:53PM +0100, Nicholas Clark wrote:
> On Mon, Oct 13, 2003 at 12:12:32PM -0400, Chip Salzenberg wrote:
> > According to Nicholas Clark:
> > > On Thu, Oct 09, 2003 at 10:41:34PM -0400, Chip Salzenberg wrote:
> > > > 3. Add (or xor or whatever) PL_new_hash_seed with the user-provided
> > > > hash value *inside* the functions that use hashes, *not* outside them.
> > >
> > > I think that it has to be "whatever" [...]
> > > I believe that information we have to play with is
> > > 1: 32 bit PERL_HASH value
> > > 2: key length
> > > 3: per run 32 bit random seed
> > > How can we generate a new value out of that at little cost?
> > OK, that was eye-opening. Let's be pessimistic and assume that we can't
> > use my initial approach -- at least, not without significant theoretical
> > doubts about its security. How about an alternative approach:
> Unfortunately I think that we can't.
> They generated exploit files. 5.8.0's exploit file is at
> Hence the information we have to play with for all these 10000 strings is
> 1: PERL_HASH value 0
> 2: length 24
> 3: a constant
> which doesn't give us a deterministic way of splitting the input into
> different buckets.
We have 1 further piece of information inside hv.c - we can know that
we are under attack.
If one of the linked lists is excessively long, then we know that someone
is throwing us pathological data.
I'm working on a variant of plans A and B. I need 1 more HV flag. Is
#define SVf_AMAGIC 0x10000000 /* has magical overloaded methods */
ever set on an HV?
Plan C is to add a per hash flag to say if the flag is using a non-standard
hash function. This defaults to off (ie standard hash function).
So normally nothing extra happens (except for 1 extra flag check)
Any hash that is flagged as non-standard hashing throws away any passed in
pre-computed hash, and rehashes using the custom function. (which will be
the same for all custom hashes - the algorithm as of 5.8.1 with the random
seed). All this rehashing happens entirely in hv.o, so we have plan A
hsplit is modified to count the length of the longest linked lists as it
splits a hash. If it finds that the longest is over some threshold (eg 50%
of all hash values are in one list after splitting) then the data is
pathological, and *that hash* switches strategy.
hsplit sets non-standard hashing, turns off shared hash keys, and iterates
over the entire hash, recalculating hash values (using the "custom"
Sure, that is a speed hit. But it only happens once for a hash, and it's
likely to happen on the 8 to 16 bucket split, so there probably aren't
that many values. And after this we get O(1) behaviour again for that
hash. (albeit possibly slightly slower if something external is computing
hash keys). It's also unlikely that any perl (or script) generated hashes
will be affected. Only hashes filled from (ab)user data.
If a hash is cleared the flags are reset to default.
I think that this scheme has
1: binary compatibility with both 5.8.1 and 5.8.0 (as it's a variant of
Chip's plan A)
2: algorithmic security of present random hashing (as robust as Chip's plan
3: but almost no speed hit for non-pathological data
I'm unlikely to have a patch tonight, and I won't be able to work on it