[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reliable reproducer, was Re: core dump analysis



On Thu, 20 Apr 2023, Michael Schmitz wrote:

> Can you try and fault in as many of these stack pages as possible, ahead 
> of filling the stack? (Depending on how much RAM you have ...). Maybe we 
> would need to lock those pages into memory? Just to show that with no 
> page faults (but still signals) there is no corruption?
> 

OK.

> > Any signal frames or exception frames have been completely overwritten 
> > because the recursion continued after the corruption took place. So 
> > there's not much to see in the core dump.
> 
> We'd need a way to stop recursion once the first corruption has taken 
> place. If the 'safe' recursion depth of 10131 is constant, the dump 
> taken at that point should look similar to what you saw in dash 
> (assuming it is the page fault and subsequent signal return that causes 
> the corruption).
> 

It turns out that the recursion depth can be set a lot lower than the 
200000 that I chose in that test program. (I used that value as it kept 
the stack size just below the default 8192 kB limit.)

At depth = 2500, a failure is around 95% certain. At depth = 2048 I can 
still get an intermittent failure. This only required 21 stack pagefaults 
and one fork.

I suspect that the location of the corruption is probably somewhat random, 
and the larger the stack happens to be when the signal comes in, the 
better the odds of detection.


Reply to: