Bug#602991: [squeeze openvz] kernel crash with null pointer dereference while umounting nfs

To: George Barnett <gbarnett@atlassian.com>
Cc: 602991@bugs.debian.org
Subject: Bug#602991: [squeeze openvz] kernel crash with null pointer dereference while umounting nfs
From: Jonathan Nieder <jrnieder@gmail.com>
Date: Wed, 15 Feb 2012 13:40:00 -0600
Message-id: <[🔎] 20120215193927.GA23759@burratino>
Reply-to: Jonathan Nieder <jrnieder@gmail.com>, 602991@bugs.debian.org
In-reply-to: <5D889DFD-77C6-47F8-B59E-2C9DF530DD0D@atlassian.com>
References: <5D889DFD-77C6-47F8-B59E-2C9DF530DD0D@atlassian.com>

Hi,

George Barnett wrote:

> We maintain a large number of OpenVZ containers on several hosts.
> In the course of running these containers, we keep a number of NFS
> mounts which are presented into the OpenVZ containers.
>
> We currently have 3  test machines we are able to test this on.  All
> are running the same image, netbooted.  The Stack trace below is
> from a 2 x 12 core AMD box, although we see the exact same crash
> with the same cause on the Intel test nodes too (2 x X5650 6 core).
>
> When we stop all the containers quickly on a host, we see the following repeatable crash:
>
> [  317.100898] CT: 10018: stopped
> [  317.912269] BUG: unable to handle kernel NULL pointer dereference at (null)
> [  317.916307] IP: [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
> [  317.916307] PGD 100a5d0067 PUD 100a557067 PMD 0
> [  317.916307] Oops: 0002 [#1] SMP
> [  317.916307] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
[...]
> [  317.916307] Pid: 7838, comm: umount Not tainted 2.6.32-5-openvz-amd64 #1 dyomin H8DGU
> [  317.916307] RIP: 0010:[<ffffffff812ea21e>]  [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
[...]
> [  317.916307] Call Trace:
> [  317.916307]  [<ffffffffa01b97a4>] ? rpc_wake_up_queued_task+0x12/0x29 [sunrpc]
> [  317.916307]  [<ffffffffa01b9835>] ? rpc_killall_tasks+0x7a/0x9b [sunrpc]
> [  317.916307]  [<ffffffffa0217fed>] ? nfs_umount_begin+0x34/0x3a [nfs]
> [  317.916307]  [<ffffffff81106844>] ? sys_umount+0x11b/0x2e6
> [  317.916307]  [<ffffffff812ec6a5>] ? do_page_fault+0x2e0/0x2fc
> [  317.916307]  [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
> [  317.916307] Code: e9 ff 5b c3 53 48 89 fb e8 a6 a4 d6 ff 48 89 df
> f0 83 2f 01 79 05 e8 42 73 e9 ff 5b c3 53 48 89 fb e8 8d a4 d6 ff b8
> 00 00 01 00 <f0> 0f c1 03 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 13
> eb f5

Thanks for a clear report.  Do you still have access to these systems?
If so, can you still reproduce this?

If this bug is still present, our best bet is probably to get help
from openvz upstream, which might involve trying a different
(alienized RHEL) or newer (3.x.y) kernel.

Sorry for the trouble,
Jonathan

Reply to:

Follow-Ups:
- Bug#602991: [squeeze openvz] kernel crash with null pointer dereference while umounting nfs
  - From: George Barnett <gbarnett@atlassian.com>

Prev by Date: Bug#640293: Presario A975 EM: fan runs at a constant (low) speed after hibernate, until it starts to overheat
Next by Date: Bug#606939: [squeeze -> 2.6.36 regression] holes in characters rendered on screen in X
Previous by thread: Processed: Re: Presario A975 EM: fan runs at a constant (low) speed after hibernate, until it starts to overheat
Next by thread: Bug#602991: [squeeze openvz] kernel crash with null pointer dereference while umounting nfs
Index(es):
- Date
- Thread