[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reliable reproducer, was Re: core dump analysis



On Thu, 20 Apr 2023, Michael Schmitz wrote:

> Am 20.04.2023 um 17:17 schrieb Finn Thain:
> > On Thu, 20 Apr 2023, Michael Schmitz wrote:
> >
> >>>
> >>> As with dash, the corruption lies the page boundary.
> >>
> >> Hence implies a page fault handled at the page boundary.
> >>
> >> Can you try and fault in as many of these stack pages as possible, ahead
> >> of filling the stack? (Depending on how much RAM you have ...). Maybe we
> >> would need to lock those pages into memory? Just to show that with no
> >> page faults (but still signals) there is no corruption?
> >>
> >
> > I modified the test program to execute rec() to full depth with no
> > forking, then do it again with forking.
> >
> > root@(none):/root# while ./stack-test 5000 ; do : ; done
> > starting recursion
> > done.
> > starting recursion with fork
> > done.
> > starting recursion
> > done.
> > starting recursion with fork
> > Illegal instruction
> > root@(none):/root#
> >
> > I can't get this to crash during the first descent. The second descent
> > always crashes, given sufficient depth:
> >
> > root@(none):/root# while ./stack-test 50000 ; do : ; done
> > starting recursion
> > done.
> > starting recursion with fork
> > Illegal instruction
> >
> > So all the stack pages would have been faulted in well before the 
> > failure shows up. It appears to be the signal that's the problem and 
> > not the page fault. That's not surprising considering the PC in the 
> > signal frame in the dash crash was a MOVEM saving registers onto the 
> > stack.
> 
> Well. without locking the faulted in pages in memory we can't be sure 
> they were not swapped back out. Unless I misunderstand what's involved 
> in that ...
> 

There was no swap enabled.

50000 frames * 36 bytes per frame == 1.8 MB


Reply to: