[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#804079: linux-image-3.16.0-4-amd64: Kernel panic on Xen virtualisation in Debian



Control: found -1 3.16.7-ckt11-1+deb8u5
Control: notfound -1 3.16.7-ckt11-1

Thanks for your report.

On Wed, 2015-11-04 at 18:53 +0100, Jan Prunk wrote:
> Package: src:linux
> Version: 3.16.7-ckt11-1

>From your text and the screenshot I think this should really be
+deb8u5. I've updated the bug metadata with the first lines.

> Severity: important
> 
> Dear Maintainer,


> The following kernel panic error appears at random in Xen
> virtualisation.

As in it has appeared randomly from time to time (i.e. more than once)
or you've had a single random instance?

> Please look at the error in screenshot attachment.
> It's a Debian 8, Kernel 3.16.7-ckt11-1+deb8u5, Xen 4.4.4-pre

The screenshot shows a fault at 0xffffffff812b6dad == memcpy+0xd,
called from ndisc_send_redirect+0x3bf.

Unfortunately disassembling memcpy from what I think is the correct dbg
package[0] results in:

Dump of assembler code for function memcpy:
   0xffffffff812b6da0 <+0>:	mov    %rdi,%rax
   0xffffffff812b6da3 <+3>:	cmp    $0x20,%rdx
   0xffffffff812b6da7 <+7>:	jb     0xffffffff812b6e27
<memcpy+135>
   0xffffffff812b6da9 <+9>:	cmp    %dil,%sil
   0xffffffff812b6dac <+12>:	jl     0xffffffff812b6de3
<memcpy+67>
   0xffffffff812b6dae <+14>:	sub    $0x20,%rdx
   0xffffffff812b6db2 <+18>:	sub    $0x20,%rdx

i.e. the faulting %rip (0xffffffff812b6dad) is not on an instruction
boundary (it would be in the middle of that jl instruction, which
cannot happen).

The call in ndisc_send_redirect disassembles sensibly and matches up
ok.

If I decode the faulting address as if it were on an instruction
boundary then I get:

(gdb) x/i 0xffffffff812b6dad
   0xffffffff812b6dad <memcpy+13>:	xor    $0x20ea8348,%eax

which isn't accessing RAM and therefore surely cannot fault.

The version you have given is corroborated by the screenshot and I am
pretty I have got the correct dbg package to match.

I suppose you haven't rebuilt the kernel or anything like that?

I don't like to put things down to "cosmic rays", but if this was a one
off then I'm struggling to think of anything else to explain what
appears to be a single bit error in %rip.

At this point I would normally ask if you had run memtest86 etc on the
machine (i.e. if the RAM is known to be solid), but this seems to be a
register and not memory related.

> It's a production machine so not much detailed further testing can be
> provided in time.
> The information below (bugreport) is executed from a different
> machine, so the info provided below is not matching the original
> machine where the error appears !

FYI it is possible to run reportbug on a machine but get it to write
the report to a file for transfer and sending from another machine.

Ian.

[0] http://security.debian.org/debian-security/pool/updates/main/l/linux/linux-image-3.16.0-4-amd64-dbg_3.16.7-ckt11-1+deb8u5_amd64.deb
 => /usr/lib/debug/vmlinux-3.16.0-4-amd64


Reply to: