[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#602991: [squeeze openvz] kernel crash with null pointer dereference while umounting nfs



Hi,

George Barnett wrote:

> We maintain a large number of OpenVZ containers on several hosts.
> In the course of running these containers, we keep a number of NFS
> mounts which are presented into the OpenVZ containers.
>
> We currently have 3  test machines we are able to test this on.  All
> are running the same image, netbooted.  The Stack trace below is
> from a 2 x 12 core AMD box, although we see the exact same crash
> with the same cause on the Intel test nodes too (2 x X5650 6 core).
>
> When we stop all the containers quickly on a host, we see the following repeatable crash:
>
> [  317.100898] CT: 10018: stopped
> [  317.912269] BUG: unable to handle kernel NULL pointer dereference at (null)
> [  317.916307] IP: [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
> [  317.916307] PGD 100a5d0067 PUD 100a557067 PMD 0
> [  317.916307] Oops: 0002 [#1] SMP
> [  317.916307] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
[...]
> [  317.916307] Pid: 7838, comm: umount Not tainted 2.6.32-5-openvz-amd64 #1 dyomin H8DGU
> [  317.916307] RIP: 0010:[<ffffffff812ea21e>]  [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
[...]
> [  317.916307] Call Trace:
> [  317.916307]  [<ffffffffa01b97a4>] ? rpc_wake_up_queued_task+0x12/0x29 [sunrpc]
> [  317.916307]  [<ffffffffa01b9835>] ? rpc_killall_tasks+0x7a/0x9b [sunrpc]
> [  317.916307]  [<ffffffffa0217fed>] ? nfs_umount_begin+0x34/0x3a [nfs]
> [  317.916307]  [<ffffffff81106844>] ? sys_umount+0x11b/0x2e6
> [  317.916307]  [<ffffffff812ec6a5>] ? do_page_fault+0x2e0/0x2fc
> [  317.916307]  [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
> [  317.916307] Code: e9 ff 5b c3 53 48 89 fb e8 a6 a4 d6 ff 48 89 df
> f0 83 2f 01 79 05 e8 42 73 e9 ff 5b c3 53 48 89 fb e8 8d a4 d6 ff b8
> 00 00 01 00 <f0> 0f c1 03 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 13
> eb f5

Thanks for a clear report.  Do you still have access to these systems?
If so, can you still reproduce this?

If this bug is still present, our best bet is probably to get help
from openvz upstream, which might involve trying a different
(alienized RHEL) or newer (3.x.y) kernel.

Sorry for the trouble,
Jonathan



Reply to: