[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)



notfound 584314 linux-2.6/2.6.32-32
notfixed 584314 linux-2.6/2.6.38-3
found 584314 linux-2.6/2.6.32-30
fixed 584314 linux-2.6/2.6.38-5
quit

Andreas Berger wrote:

> in linux-image-2.6.32-5-686, version 2.6.32-30, the bug was still there,
> in linux-image-2.6.38-2-686, version 2.6.38-5, the bug was no longer there,
>
> in between the two, i don't know, but if it helps, i can narrow it down as 
> soon as i get home to a spare hard drive.

Sure, it would help to narrow the search for the fix (but see below to
save some time).

> On Thursday, July 28, 2011 04:19:39 Jonathan Nieder wrote:

>>  - could you send a photo of the screen during the oops, so we can read
>>    the backtrace?
>
> i typed it off the screen and included it in my previous mail here: 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=22;bug=584314
>
> is that not what you mean?

Unfortunately what you typed doesn't include the call trace (or maybe
there was none).  It does include the code, which when passed through
scripts/decodecode looks like this:

| kernel:[ 496.263433] Code: 04 01 00 00 00 66 83 7c 24 28 00 79 37 89 f5 31 db eb 2b ba 03 00 00 00 89 e8 e8 ee 73 fa ff b9 00 04 00 00 89 04 24 89 c7 31 c0 <f3> ab 8b 04 24 ba 03 00 00 00 43 83 c5 20 e8 20 72 fa ff 3b 5c
[...]
|   11:	eb 2b                	jmp    0x3e
|   13:	ba 03 00 00 00       	mov    $0x3,%edx
|   18:	89 e8                	mov    %ebp,%eax
|   1a:	e8 ee 73 fa ff       	callq  0xfffffffffffa740d
|   1f:	b9 00 04 00 00       	mov    $0x400,%ecx
|   24:	89 04 24             	mov    %eax,(%rsp)
|   27:	89 c7                	mov    %eax,%edi
|   29:	31 c0                	xor    %eax,%eax
|   2b:*	f3 ab                	rep stos %eax,%es     <-- trapping instruction:(%rdi)
|   2d:	8b 04 24             	mov    (%rsp),%eax

Building mm/page_alloc.s and comparing, we see that this is in
"clear_highpage"; the function call starting on line 13 is to
kmap_atomic and the trapping rep stos is memset(page, 0, PAGE_SIZE).

Unwinding a little: clear_highpage is called by prep_zero_page,
which is called by prep_new_page, which is called by buffered_rmqueue,
which is called by get_page_from_freelist for each potentially
free page.

I suspect memory corruption.  Maybe v2.6.37-rc5~3^2 (PM / Hibernate:
Fix memory corruption related to swap, 2010-12-03) fixes it.  Could
you test 2.6.37-rc5 and 2.6.37-rc4?

Thanks,
Jonathan



Reply to: