[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Plan C (was Re: Plan B for fixing 5.8.2 binary API)



On Thu, Oct 16, 2003 at 11:55:41AM +0100, Alan Burlison wrote:
> Nicholas Clark wrote:
> 
> >Plan C is to add a per hash flag to say if the flag is using a non-standard
> >hash function. This defaults to off (ie standard hash function).
> >So normally nothing extra happens (except for 1 extra flag check)
> >
> >Any hash that is flagged as non-standard hashing throws away any passed in
> >pre-computed hash, and rehashes using the custom function. (which will be
> >the same for all custom hashes - the algorithm as of 5.8.1 with the random
> >seed). All this rehashing happens entirely in hv.o, so we have plan A
> >behaviour there.
> >
> >hsplit is modified to count the length of the longest linked lists as it
> >splits a hash. If it finds that the longest is over some threshold (eg 50%
> >of all hash values are in one list after splitting) then the data is
> >pathological, and *that hash* switches strategy.
> >hsplit sets non-standard hashing, turns off shared hash keys, and iterates
> >over the entire hash, recalculating hash values (using the "custom"
> >function).
> 
> I'm probably being dense, but what does switching behaviour on the fly gain 
> us?  Why not just use the 2nd strategy *all* the time, and avoid the need 
> for a flag check?
> 
> Me no understand...

75% of the hash fetch/store API allows the caller to pass in a pre-computed
hash value. I'm not aware of how many things use this, but I know that
the peephole optimiser changes most string constants that are to be used
for hash lookups into SVs that contain the pre-computed hash value.
I think that Nick I-S reported that the change that this was part of
accelerated one of his real-world TK apps by about 2%, so it seems that in
some cases there are speed gains to be had from pre-computed hashing.

If we always used the second strategy, then we throw away these
pre-computed hash values (if they are passed in). We'll actually be
penalising these callers, rather than optimising. All this for defence
against an uncommon case (an algorithmic attack (or pathological data))

Nicholas Clark



Reply to: