[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Plan C Worked. And there was much rejoicing. (yeah)



On Tue, Oct 21, 2003 at 04:55:54PM -0400, Chip Salzenberg wrote:
> According to Nicholas Clark:
> > Something like this? Only with the rough edges smoothed out?
> 
> In short: Nick++
> 
> In long: Brendan O'Dea, Debian's Perl maintainer, has verified that
> HTML::Parser built under 5.8.0 fails with 5.8.1, but works with 5.8.1
> and the Plan C patch.  Since there's just the one bug and HTML::Parser
> tickled it, this is enough proof for me that Plan C works.
> 
> Brendan has uploaded the patched 5.8.1 to Debian unstable, so it will
> get significant usage, but only by users who are relatively tolerant
> of breakage.  (Debian unstable is ideal for this kind of thing.)
> 
> However, I agree with Nick that some refinement is needed, and I'm
> sure Brendan will be happy to upload again when the next iteration of
> Plan C is available.

OK. I'm not sure what to do. The "traditional" logic for triggering
a hash split is

    if (i) {                            /* initial entry? */
        xhv->xhv_fill++; /* HvFILL(hv)++ */
        if (xhv->xhv_keys > (IV)xhv->xhv_max /* HvKEYS(hv) > HvMAX(hv) */)
            hsplit(hv);
    }

where i is true if one of the buckets has just had a new list created

5.8.1 has 1 of the 3 instances changed to the following - maint now
has all 3:

    if (i) {                            /* initial entry? */
        xhv->xhv_fill++; /* HvFILL(hv)++ */
    } else if (xhv->xhv_keys > (IV)xhv->xhv_max /* HvKEYS(hv) > HvMAX(hv) */) {
        hsplit(hv);
    }

because previously a hash never got split if you kept adding to existing
chains. If I understand the code correctly, now it gets split if both

1: you've added to an existing linked list
2: there are now more entries in the hash than buckets

(but no check on linked list length, or alternatively how well distributed
the keys are over the used buckets)

I was thinking of taking out this test on keys > buckets completely and
instead doing the split based on linked list length, with the
rehash kicking in if even after the split the list that triggered is still
overlong. But now I'm not so sure. This doesn't feel a very conservative
change - does it sound like it has serious performance implications?

Without the rehash there would be a danger that an attacker added keys
with the same hash value, and each time the hash would split (double the
memory use) plus the linked list gets longer (still linear search)

However, with the code now in, the split would fail to make the list
shorter, so that would be a trigger for a rehash, and the keys should
now all spread nicely

(I expect that there needs to be a defence against bad luck, in
that if the random hashing has kicked in and you still end up with
everything in a single list, you use the keys > buckets logic of old
rather than triggering constant splits)

Nicholas Clark



Reply to: