[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#631187: Kernel panics when removing external hard drive



Hi,

Alexander Kurtz wrote:
> On Wed, 2011-06-22 at 03:40 +0100, Ben Hutchings wrote:

>> The panic message shows there was an earlier kernel warning; please can
>> you provide that.
>
> Thanks to netconsole (a really great tool!) I was able to so. The
> attached kernel log starts right before I plug the drive in.
> Surprisingly the kernel didn't crash the first time, but after trying
> again, everything went as expected (see lines 17 and 35).

Sorry for the long silence.  Let's see:

> [ 1421.182657] sd 7:0:0:0: [sdc] Attached SCSI disk
> [ 1454.865926] WARNING! power/level is deprecated; use power/control instead

Seems harmless enough.

> [ 1478.728383] sd 8:0:0:0: [sdc] Attached SCSI disk
> [ 1491.693027] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
> [ 1491.693229] IP: [<ffffffff8118b2e3>] elv_completed_request+0x38/0x47

The panic.

[...]
> [ 1491.696825] Code: 40 74 35 83 7e 44 01 74 04 a8 40 74 2b 83 e0 11 ff c8 0f 95 c0 83 e0 01 48 05 fc 00 00 00 ff 4c 87 04 f6 46 41 04 74 10 48 8b 02 
> [ 1491.696825]  8b 40 48 48 85 c0 74 04 41 58 ff e0 59 c3 48 8d be 80 00 00 
> [ 1491.696825] RIP  [<ffffffff8118b2e3>] elv_completed_request+0x38/0x47

Disassembly, for convenience (following the hints from
Documentation/oops-tracing.txt):

| <+0>:     rex je 0x6008b8 <str+56>
| <+3>:     cmpl   $0x1,0x44(%rsi)
| <+7>:     je     0x60088d <str+13>
| <+9>:     test   $0x40,%al
| <+11>:    je     0x6008b8 <str+56>
| <+13>:    and    $0x11,%eax
| <+16>:    dec    %eax
| <+18>:    setne  %al
| <+21>:    and    $0x1,%eax
| <+24>:    add    $0xfc,%rax
| <+30>:    decl   0x4(%rdi,%rax,4)
| <+34>:    testb  $0x4,0x41(%rsi)
| <+38>:    je     0x6008b8 <str+56>
| <+40>:    mov    (%rdx),%rax
| <+43>:    cmp    %ah,0x40(%rdx)
| <+46>:    rex.W
| <+47>:    test   %rax,%rax
| <+50>:    je     0x6008b8 <str+56>
| <+52>:    pop    %r8
| <+54>:    jmpq   *%rax
| <+56>:    pop    %rcx
| <+57>:    retq   
| <+58>:    lea    0x80(%rsi),%rdi

So offset 0x38 is the jump in

		if ((rq->cmd_flags & REQ_SORTED) &&

As for why that involves an access to the address 0x48: well, that
is beyond my depth.  rq->cmd_flags was already accessed in the check

	if (blk_account_rq(rq))

Maybe the actual cause of the fault is some different instruction and
the instruction pointer is not to be trusted (?).  I suppose if I were
in this situation, I'd sprinkle block/elevator.c::elv_completed_request
with printk calls to be able to witness exactly what happens.

Sorry for the trouble, and hope that helps.
Jonathan



Reply to: