Bug#804079: linux-image-3.16.0-4-amd64: Kernel panic on Xen virtualisation in Debian
Control: found -1 3.16.7-ckt11-1+deb8u5
Control: notfound -1 3.16.7-ckt11-1
Thanks for your report.
On Wed, 2015-11-04 at 18:53 +0100, Jan Prunk wrote:
> Package: src:linux
> Version: 3.16.7-ckt11-1
>From your text and the screenshot I think this should really be
+deb8u5. I've updated the bug metadata with the first lines.
> Severity: important
>
> Dear Maintainer,
> The following kernel panic error appears at random in Xen
> virtualisation.
As in it has appeared randomly from time to time (i.e. more than once)
or you've had a single random instance?
> Please look at the error in screenshot attachment.
> It's a Debian 8, Kernel 3.16.7-ckt11-1+deb8u5, Xen 4.4.4-pre
The screenshot shows a fault at 0xffffffff812b6dad == memcpy+0xd,
called from ndisc_send_redirect+0x3bf.
Unfortunately disassembling memcpy from what I think is the correct dbg
package[0] results in:
Dump of assembler code for function memcpy:
0xffffffff812b6da0 <+0>: mov %rdi,%rax
0xffffffff812b6da3 <+3>: cmp $0x20,%rdx
0xffffffff812b6da7 <+7>: jb 0xffffffff812b6e27
<memcpy+135>
0xffffffff812b6da9 <+9>: cmp %dil,%sil
0xffffffff812b6dac <+12>: jl 0xffffffff812b6de3
<memcpy+67>
0xffffffff812b6dae <+14>: sub $0x20,%rdx
0xffffffff812b6db2 <+18>: sub $0x20,%rdx
i.e. the faulting %rip (0xffffffff812b6dad) is not on an instruction
boundary (it would be in the middle of that jl instruction, which
cannot happen).
The call in ndisc_send_redirect disassembles sensibly and matches up
ok.
If I decode the faulting address as if it were on an instruction
boundary then I get:
(gdb) x/i 0xffffffff812b6dad
0xffffffff812b6dad <memcpy+13>: xor $0x20ea8348,%eax
which isn't accessing RAM and therefore surely cannot fault.
The version you have given is corroborated by the screenshot and I am
pretty I have got the correct dbg package to match.
I suppose you haven't rebuilt the kernel or anything like that?
I don't like to put things down to "cosmic rays", but if this was a one
off then I'm struggling to think of anything else to explain what
appears to be a single bit error in %rip.
At this point I would normally ask if you had run memtest86 etc on the
machine (i.e. if the RAM is known to be solid), but this seems to be a
register and not memory related.
> It's a production machine so not much detailed further testing can be
> provided in time.
> The information below (bugreport) is executed from a different
> machine, so the info provided below is not matching the original
> machine where the error appears !
FYI it is possible to run reportbug on a machine but get it to write
the report to a file for transfer and sending from another machine.
Ian.
[0] http://security.debian.org/debian-security/pool/updates/main/l/linux/linux-image-3.16.0-4-amd64-dbg_3.16.7-ckt11-1+deb8u5_amd64.deb
=> /usr/lib/debug/vmlinux-3.16.0-4-amd64
Reply to: