Bug#631187: Kernel panics when removing external hard drive
Hi,
Alexander Kurtz wrote:
> On Wed, 2011-06-22 at 03:40 +0100, Ben Hutchings wrote:
>> The panic message shows there was an earlier kernel warning; please can
>> you provide that.
>
> Thanks to netconsole (a really great tool!) I was able to so. The
> attached kernel log starts right before I plug the drive in.
> Surprisingly the kernel didn't crash the first time, but after trying
> again, everything went as expected (see lines 17 and 35).
Sorry for the long silence. Let's see:
> [ 1421.182657] sd 7:0:0:0: [sdc] Attached SCSI disk
> [ 1454.865926] WARNING! power/level is deprecated; use power/control instead
Seems harmless enough.
> [ 1478.728383] sd 8:0:0:0: [sdc] Attached SCSI disk
> [ 1491.693027] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
> [ 1491.693229] IP: [<ffffffff8118b2e3>] elv_completed_request+0x38/0x47
The panic.
[...]
> [ 1491.696825] Code: 40 74 35 83 7e 44 01 74 04 a8 40 74 2b 83 e0 11 ff c8 0f 95 c0 83 e0 01 48 05 fc 00 00 00 ff 4c 87 04 f6 46 41 04 74 10 48 8b 02
> [ 1491.696825] 8b 40 48 48 85 c0 74 04 41 58 ff e0 59 c3 48 8d be 80 00 00
> [ 1491.696825] RIP [<ffffffff8118b2e3>] elv_completed_request+0x38/0x47
Disassembly, for convenience (following the hints from
Documentation/oops-tracing.txt):
| <+0>: rex je 0x6008b8 <str+56>
| <+3>: cmpl $0x1,0x44(%rsi)
| <+7>: je 0x60088d <str+13>
| <+9>: test $0x40,%al
| <+11>: je 0x6008b8 <str+56>
| <+13>: and $0x11,%eax
| <+16>: dec %eax
| <+18>: setne %al
| <+21>: and $0x1,%eax
| <+24>: add $0xfc,%rax
| <+30>: decl 0x4(%rdi,%rax,4)
| <+34>: testb $0x4,0x41(%rsi)
| <+38>: je 0x6008b8 <str+56>
| <+40>: mov (%rdx),%rax
| <+43>: cmp %ah,0x40(%rdx)
| <+46>: rex.W
| <+47>: test %rax,%rax
| <+50>: je 0x6008b8 <str+56>
| <+52>: pop %r8
| <+54>: jmpq *%rax
| <+56>: pop %rcx
| <+57>: retq
| <+58>: lea 0x80(%rsi),%rdi
So offset 0x38 is the jump in
if ((rq->cmd_flags & REQ_SORTED) &&
As for why that involves an access to the address 0x48: well, that
is beyond my depth. rq->cmd_flags was already accessed in the check
if (blk_account_rq(rq))
Maybe the actual cause of the fault is some different instruction and
the instruction pointer is not to be trusted (?). I suppose if I were
in this situation, I'd sprinkle block/elevator.c::elv_completed_request
with printk calls to be able to witness exactly what happens.
Sorry for the trouble, and hope that helps.
Jonathan
Reply to: