[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#763192: [LXC] [nfsd] kernel crash when running nfs-kernel-server in one LXC Container



Package: nfs-kernel-server
Version: 1:1.2.6-4
Severity: serious

Hi dear maintainer,

I have on problem on nfs-kernel-server (Debian Wheezy) when installed in one LXC container (debian Wheezy) on one Debian Jessie host with very recent kernel (3.16-2-amd64)

The configuration is hosted on one IBM IBM eServer x3400 (2 CPU 4 cores)

- config 1:  host configuration
--------------------------------

admlocal@srv-alex:~$ uname -a
Linux srv-alex 3.16-2-amd64 #1 SMP Debian 3.16.3-2 (2014-09-20) x86_64 GNU/Linux

admlocal@srv-alex:~$ cat /etc/debian_version
jessie/sid

admlocal@srv-alex:~$ dpkg -l |grep libc6
ii libc6:amd64 2.19-11 amd64 GNU C Library: Shared libraries ii libc6-dev:amd64 2.19-11 amd64 GNU C Library: Development Libraries and Header Files ii libcompfaceg1 1:1.5.2-5 amd64 Compress/decompress images for mailheaders, libc6 runtime


Please note that Jessie is up to date when writing this email

admlocal@srv-alex:~$ cat /etc/apt/sources.list |grep -v "#"

deb     http://ftp.fr.debian.org/debian/ jessie main contrib non-free
deb-src http://ftp.fr.debian.org/debian/ jessie main contrib non-free

root@srv-alex:~# lsmod |grep nfs
nfsv3                  37551  1
nfs                   187961  2 nfsv3
fscache                45542  1 nfs
nfsd                  263053  9
auth_rpcgss            51240  1 nfsd
nfs_acl                12511  2 nfsd,nfsv3
lockd                  83417  3 nfs,nfsd,nfsv3
sunrpc                237445  30 nfs,nfsd,auth_rpcgss,lockd,nfsv3,nfs_acl

- config 2:  container configuration
-----------------------------------

According LXC feature, i have installed one LXC adm64 container based on stable Debian distribution (Wheezy 7.6) in order to be sure to have one stable user space daemon version.

root@vm-wheezy-x86-amd64-3:~# dpkg -l |grep libc6
ii libc6:amd64 2.13-38+deb7u4 amd64 Embedded GNU C Library: Shared libraries ii libc6-dbg:amd64 2.13-38+deb7u4 amd64 Embedded GNU C Library: detached debugging symbols ii libc6-dev:amd64 2.13-38+deb7u4 amd64 Embedded GNU C Library: Development Libraries and Header Files ii libcompfaceg1 1:1.5.2-5 amd64 Compress/decompress images for mailheaders, libc6 runtime

root@vm-wheezy-x86-amd64-3:~# dpkg -l |grep nfs
ii libnfsidmap2:amd64 0.25-4 amd64 NFS idmapping library ii nfs-common 1:1.2.6-4 amd64 NFS support files common to client and server ii nfs-kernel-server 1:1.2.6-4 amd64 support for NFS kernel server

root@vm-wheezy-x86-amd64-3:~# cat /etc/exports  |grep -v '#'
/tmp    *(rw,sync,no_subtree_check)


internal mount point are as follow in the container

root@vm-wheezy-x86-amd64-3:~# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs                                           190M   38M  142M  22% /
/dev/mapper/vg_wheezy_x86_amd64_3-lv_rootfs      190M   38M  142M  22% /
/dev/mapper/vg_raid_0-lv_tmp_wheezy_x86_amd64_3  4.8G   11M  4.6G   1% /tmp
/dev/mapper/vg_wheezy_x86_amd64_3-lv_var         575M  176M  370M  33% /var
/dev/mapper/vg_wheezy_x86_amd64_3-lv_usr         3.4G  2.7G  528M  84% /usr
tmpfs                                            599M   56K  599M   1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 1.2G 0 1.2G 0% /run/shm
root@vm-wheezy-x86-amd64-3:~# /etc/init.d/nfs-kernel-server restart
[ ok ] Stopping NFS kernel daemon: mountd nfsd.
[ ok ] Unexporting directories for NFS kernel daemon....
[ ok ] Exporting directories for NFS kernel daemon....
[ ok ] Starting NFS kernel daemon: nfsd mountd.

As it is not possible for my contianer to insert nfsd kernel module, this one is loaded in /etc/module on the host during booting processus.

The container is starting and the last daemon run (nfsd), but in dmesg, i have following message....

[  160.421748] ------------[ cut here ]------------
[ 160.421777] WARNING: CPU: 5 PID: 4638 at /build/linux-P15SNz/linux-3.16.3/fs/nfsd/nfs4recover.c:1195 nfsd4_umh_cltrack_init+0x3a/0x40 [nfsd]() [ 160.421779] NFSD: attempt to initialize umh client tracking in a container! [ 160.421900] Modules linked in: veth nfsv3 nfs fscache binfmt_misc bridge 8021q garp stp mrp llc iTCO_wdt iTCO_vendor_support ppdev joydev raid0 radeon ttm coretemp drm_kms_helper psmouse drm kvm_intel evdev i5000_edac pcspkr i2c_algo_bit edac_coe serio_raw kvm parport_pc parport shpchp i2c_i801 i2c_core i5k_amb lpc_ich mfd_core rng_core processor thermal_sys button nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc loop autofs4 ext4 crc16 mbcache jbd2 dm_mod raid1 md_mod hid_generic usbhi hid sg sd_mod crc_t10dif crct10dif_generic ses enclosure crct10dif_common ehci_pci uhci_hcd ehci_hcd tg3 ptp aacraid usbcore pps_core libphy usb_common scsi_mod [ 160.421953] CPU: 5 PID: 4638 Comm: rpc.nfsd Tainted: G I 3.16-2-amd64 #1 Debian 3.16.3-2 [ 160.421955] Hardware name: IBM IBM eServer x3400-[7976ABG]-/M97IP, BIOS IBM BIOS Version 1.62-[SPE162AUS-1.62]- 11/09/2007 [ 160.421958] 0000000000000009 ffffffff81506188 ffff8801b6f87d98 ffffffff81065707 [ 160.421961] ffff8800b9df7600 ffff8801b6f87de8 ffff8800b9df7600 0000000000000008 [ 160.421963] 0000000000000000 ffffffff8106576c ffffffffa02e27a8 0000000000000018
[  160.421967] Call Trace:
[  160.421975]  [<ffffffff81506188>] ? dump_stack+0x41/0x51
[  160.421980]  [<ffffffff81065707>] ? warn_slowpath_common+0x77/0x90
[  160.421983]  [<ffffffff8106576c>] ? warn_slowpath_fmt+0x4c/0x50
[ 160.421991] [<ffffffffa02dd1aa>] ? nfsd4_umh_cltrack_init+0x3a/0x40 [nfsd] [ 160.421998] [<ffffffffa02de461>] ? nfsd4_client_tracking_init+0x81/0x130 [nfsd] [ 160.422006] [<ffffffffa02d8a62>] ? nfs4_state_start_net+0x2a2/0x340 [nfsd]
[  160.422013]  [<ffffffffa02b3b20>] ? nfsd_svc+0x1d0/0x330 [nfsd]
[  160.422019]  [<ffffffffa02b4600>] ? write_pool_threads+0x260/0x260 [nfsd]
[  160.422025]  [<ffffffffa02b468a>] ? write_threads+0x8a/0xf0 [nfsd]
[  160.422031]  [<ffffffff8113ecca>] ? __get_free_pages+0xa/0x50
[  160.422035]  [<ffffffff811ca5e0>] ? simple_transaction_get+0xa0/0xc0
[ 160.422041] [<ffffffffa02b4093>] ? nfsctl_transaction_write+0x43/0x70 [nfsd]
[  160.422045]  [<ffffffff811a52f2>] ? vfs_write+0xb2/0x1f0
[  160.422048]  [<ffffffff811a5e32>] ? SyS_write+0x42/0xa0
[ 160.422052] [<ffffffff8150c26d>] ? system_call_fast_compare_end+0x10/0x15 [ 160.422086] WARNING: CPU: 5 PID: 4638 at /build/linux-P15SNz/linux-3.16.3/fs/nfsd/nfs4recover.c:530 nfsd4_legacy_tracking_init+0x1aa/0x240 [nfsd]() [ 160.422087] NFSD: attempt to initialize legacy client tracking in a container!

After killing all processes, container and removing kernel module, i restart the same scenario

root@srv-alex:~# rmmod nfsd
root@srv-alex:~# rmmod nfsd
rmmod: ERROR: Module nfsd is not currently loaded
root@srv-alex:~# lsmod |grep nfs
nfsv3                  37551  1
nfs                   187961  2 nfsv3
fscache                45542  1 nfs
nfs_acl                12511  1 nfsv3
lockd                  83417  2 nfs,nfsv3
sunrpc                237445  18 nfs,auth_rpcgss,lockd,nfsv3,nfs_acl

root@srv-alex:~# modprobe nfsd nfs4_disable_idmapping=0

then restart the container...
lxc-start -f /etc/lxc/auto/vm-wheezy-x86-amd64-3 -n vm-wheezy-x86-amd64-3

LXC container config is ...
root@srv-alex:~# cat /etc/lxc/auto/vm-wheezy-x86-amd64-3   |grep -v "#"

---------------------- container config --------------

lxc.arch                 = amd64
lxc.utsname              = vm-wheezy-x86-amd64-3
lxc.start.auto           = 1

lxc.tty                  = 4
lxc.pts                  = 1024
lxc.rootfs               = /var/lib/lxc/vm-wheezy-x86-amd64-3/rootfs
lxc.cgroup.devices.deny  = a
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
lxc.cgroup.devices.allow = c 10:235 rwm
lxc.cgroup.devices.allow = c 254:0 rwm
lxc.mount.entry = proc /var/lib/lxc/vm-wheezy-x86-amd64-3/rootfs/proc proc nodev,noexec,nosuid 0 0 lxc.mount.entry = devpts /var/lib/lxc/vm-wheezy-x86-amd64-3/rootfs/dev/pts devpts defaults 0 0 lxc.mount.entry = sysfs /var/lib/lxc/vm-wheezy-x86-amd64-3/rootfs/sys sysfs defaults 0 0

lxc.cgroup.cpuset.cpus   = 1-7

lxc.network.type         = veth
lxc.network.flags        = up
lxc.network.link         = br-admin
lxc.network.name         = eth0-admin
lxc.network.hwaddr       = 02:00:00:02:01:00
lxc.network.veth.pair    = e0-wham64adm

lxc.network.type         = veth
lxc.network.flags        = up
lxc.network.link         = br-services
lxc.network.name         = eth1-services
lxc.network.hwaddr       = 02:00:00:02:01:01
lxc.network.veth.pair    = e1-wham64srv

lxc.network.type         = veth
lxc.network.flags        = up
lxc.network.link         = br-users
lxc.network.name         = eth2-users
lxc.network.hwaddr       = 02:00:00:02:01:02
lxc.network.veth.pair    = e2-wham64usr
---------------------- container config END --------------


..... and the problem occurs with the same error.

Of course, as there is a one fatal error in the kernel, it is not possible to restart the container due to temporary name used when creating network interface (unrelated problem with NFS, i hope!), and the server must be restarted....or all nfsd processus must be killed with -9 signal

The container NFS config is :

root@srv-alex:~# cat /var/lib/lxc/vm-wheezy-x86-amd64-3/rootfs/etc/default/nfs-common |grep -v '#'
-------------------------- start config ----------------------
NEED_STATD=yes

STATDOPTS="--port 32766 --outgoing-port 32765"

NEED_IDMAPD=no

NEED_GSSD=no
-------------------------- end config ----------------------
root@srv-alex:~# cat /var/lib/lxc/vm-wheezy-x86-amd64-3/rootfs/etc/default/nfs-kernel-server |grep -v '#'
-------------------------- start config ----------------------
RPCNFSDCOUNT=8

RPCNFSDPRIORITY=0

RPCMOUNTDOPTS='--manage-gids --port 32767 --num-threads=6 --no-nfs-version 4'

NEED_SVCGSSD=no

RPCSVCGSSDOPTS=

-------------------------- end config ----------------------

The log when booting the container is ....
-------------------------- start log ----------------------
root@srv-alex:~# lxc-start -f /etc/lxc/auto/vm-wheezy-x86-amd64-3 -n vm-wheezy-x86-amd64-3
INIT: version 2.88 booting
Using makefile-style concurrent boot in runlevel S.
Setting the system clock.
hwclock: Cannot access the Hardware Clock via any known method.
hwclock: Use the --debug option to see the details of our search for an access method.
Unable to set System Clock to: Sun Sep 28 17:31:07 CEST 2014 ... (warning).
Activating swap...done.
Cleaning up temporary files... /tmp /lib/init/rw.
Mount point '/dev/console' does not exist. Skipping mount. ... (warning).
Mount point '/dev/tty1' does not exist. Skipping mount. ... (warning).
Mount point '/dev/tty2' does not exist. Skipping mount. ... (warning).
Mount point '/dev/tty3' does not exist. Skipping mount. ... (warning).
Mount point '/dev/tty4' does not exist. Skipping mount. ... (warning).
Mount point '/dev/ptmx' does not exist. Skipping mount. ... (warning).
Activating lvm and md swap...done.
Checking file systems...fsck from util-linux 2.20.1
done.
Mounting local filesystems...done.
/etc/init.d/mountall.sh: 59: kill: Illegal number: 3 1
Activating swapfile swap...done.
Cleaning up temporary files....
Setting kernel variables ...done.
Configuring network interfaces...Internet Systems Consortium DHCP Client 4.2.2
Copyright 2004-2011 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Listening on LPF/eth0-admin/02:00:00:02:01:00
Sending on   LPF/eth0-admin/02:00:00:02:01:00
Sending on   Socket/fallback
DHCPDISCOVER on eth0-admin to 255.255.255.255 port 67 interval 5
DHCPREQUEST on eth0-admin to 255.255.255.255 port 67
DHCPOFFER from 192.168.9.8
DHCPACK from 192.168.9.8
bound to 192.168.9.29 -- renewal in 25 seconds.
if-up.d/mountnfs[eth0-admin]: waiting for interface eth1-services before doing NFS mounts ... (warning).
Internet Systems Consortium DHCP Client 4.2.2
Copyright 2004-2011 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/eth1-services/02:00:00:02:01:01
Sending on   LPF/eth1-services/02:00:00:02:01:01
Sending on   Socket/fallback
DHCPDISCOVER on eth1-services to 255.255.255.255 port 67 interval 8
DHCPREQUEST on eth1-services to 255.255.255.255 port 67
DHCPOFFER from 192.168.6.2
DHCPACK from 192.168.6.2

Debian GNU/Linux 7 vm-wheezy-x86-amd64-3 console

vm-wheezy-x86-amd64-3 login: root
Password:
Last login: Sun Sep 28 17:15:01 CEST 2014 on console
Linux vm-wheezy-x86-amd64-3 3.16-2-amd64 #1 SMP Debian 3.16.3-2 (2014-09-20) x86_64


-------------------------- end log ----------------------


I tried also to put the parameter to RPCNFSDCOUNT=1,
and the result is after error
----------------------------start trace -------------------------------
[ 2996.630588] ------------[ cut here ]------------
[ 2996.630608] WARNING: CPU: 7 PID: 11074 at /build/linux-P15SNz/linux-3.16.3/fs/nfsd/nfs4recover.c:1195 nfsd4_umh_cltrack_init+0x3a/0x40 [nfsd]() [ 2996.630609] NFSD: attempt to initialize umh client tracking in a container! [ 2996.630729] Modules linked in: nfsd veth nfsv3 nfs fscache binfmt_misc bridge 8021q garp stp mrp llc iTCO_wdt iTCO_vendor_support ppdev joydev raid0 radettm coretemp drm_kms_helper psmouse drm kvm_intel evdev i5000_edac pcspkr i2c_algo_bit edac_core serio_raw kvm parport_pc parport shpchp i2c_i801 i2c_core amb lpc_ich mfd_core rng_core processor thermal_sys button auth_rpcgss oid_registry nfs_acl lockd sunrpc loop autofs4 ext4 crc16 mbcache jbd2 dm_mod raidod hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_generic ses enclosure crct10dif_common ehci_pci uhci_hcd ehci_hcd tg3 ptp aacraid usbcore pps_cophy usb_common scsi_mod [last unloaded: nfsd] [ 2996.630790] CPU: 7 PID: 11074 Comm: rpc.nfsd Tainted: G W I 3.16-2-amd64 #1 Debian 3.16.3-2 [ 2996.630792] Hardware name: IBM IBM eServer x3400-[7976ABG]-/M97IP, BIOS IBM BIOS Version 1.62-[SPE162AUS-1.62]- 11/09/2007 [ 2996.630794] 0000000000000009 ffffffff81506188 ffff8801b6813d98 ffffffff81065707 [ 2996.630797] ffff8801b5d45e00 ffff8801b6813de8 ffff8801b5d45e00 0000000000000001 [ 2996.630800] 0000000000000000 ffffffff8106576c ffffffffa02e27a8 0000000000000018
[ 2996.630803] Call Trace:
[ 2996.630811]  [<ffffffff81506188>] ? dump_stack+0x41/0x51
[ 2996.630816]  [<ffffffff81065707>] ? warn_slowpath_common+0x77/0x90
[ 2996.630819]  [<ffffffff8106576c>] ? warn_slowpath_fmt+0x4c/0x50
[ 2996.630826] [<ffffffffa02dd1aa>] ? nfsd4_umh_cltrack_init+0x3a/0x40 [nfsd] [ 2996.630832] [<ffffffffa02de461>] ? nfsd4_client_tracking_init+0x81/0x130 [nfsd] [ 2996.630839] [<ffffffffa02d8a62>] ? nfs4_state_start_net+0x2a2/0x340 [nfsd]
[ 2996.630844]  [<ffffffffa02b3b20>] ? nfsd_svc+0x1d0/0x330 [nfsd]
[ 2996.630850]  [<ffffffffa02b4600>] ? write_pool_threads+0x260/0x260 [nfsd]
[ 2996.630855]  [<ffffffffa02b468a>] ? write_threads+0x8a/0xf0 [nfsd]
[ 2996.630860]  [<ffffffff8113ecca>] ? __get_free_pages+0xa/0x50
[ 2996.630864]  [<ffffffff811ca5e0>] ? simple_transaction_get+0xa0/0xc0
[ 2996.630869] [<ffffffffa02b4093>] ? nfsctl_transaction_write+0x43/0x70 [nfsd]
[ 2996.630873]  [<ffffffff811a52f2>] ? vfs_write+0xb2/0x1f0
[ 2996.630876]  [<ffffffff811a5e32>] ? SyS_write+0x42/0xa0
[ 2996.630880] [<ffffffff8150c26d>] ? system_call_fast_compare_end+0x10/0x15
[ 2996.630882] ---[ end trace 61dda43e27c71f62 ]---
[ 2996.630887] ------------[ cut here ]------------

[ 2996.630894] WARNING: CPU: 7 PID: 11074 at /build/linux-P15SNz/linux-3.16.3/fs/nfsd/nfs4recover.c:530 nfsd4_legacy_tracking_init+0x1aa/0x240 [nfsd]() [ 2996.630895] NFSD: attempt to initialize legacy client tracking in a container! [ 2996.630999] Modules linked in: nfsd veth nfsv3 nfs fscache binfmt_misc bridge 8021q garp stp mrp llc iTCO_wdt iTCO_vendor_support ppdev joydev raid0 ttm coretemp drm_kms_helper psmouse drm kvm_intel evdev i5000_edac pcspkr i2c_algo_bit edac_core serio_raw kvm parport_pc parport shpchp i2c_i801 i2c_coamb lpc_ich mfd_core rng_core processor thermal_sys button auth_rpcgss oid_registry nfs_acl lockd sunrpc loop autofs4 ext4 crc16 mbcache jbd2 dm_mod raiod hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_generic ses enclosure crct10dif_common ehci_pci uhci_hcd ehci_hcd tg3 ptp aacraid usbcore pps_cphy usb_common scsi_mod [last unloaded: nfsd] [ 2996.631043] CPU: 7 PID: 11074 Comm: rpc.nfsd Tainted: G W I 3.16-2-amd64 #1 Debian 3.16.3-2 [ 2996.631045] Hardware name: IBM IBM eServer x3400-[7976ABG]-/M97IP, BIOS IBM BIOS Version 1.62-[SPE162AUS-1.62]- 11/09/2007 [ 2996.631046] 0000000000000009 ffffffff81506188 ffff8801b6813d80 ffffffff81065707 [ 2996.631049] ffff8801b5d45e00 ffff8801b6813dd0 0000000000004000 0000000000000001 [ 2996.631052] 0000000000000000 ffffffff8106576c ffffffffa02e2a28 ffff880100000018
[ 2996.631055] Call Trace:
[ 2996.631058]  [<ffffffff81506188>] ? dump_stack+0x41/0x51
[ 2996.631061]  [<ffffffff81065707>] ? warn_slowpath_common+0x77/0x90
[ 2996.631064]  [<ffffffff8106576c>] ? warn_slowpath_fmt+0x4c/0x50
[ 2996.631068]  [<ffffffff812b4258>] ? lockref_put_or_lock+0x48/0x80
[ 2996.631074] [<ffffffffa02de2ca>] ? nfsd4_legacy_tracking_init+0x1aa/0x240 [nfsd] [ 2996.631080] [<ffffffffa02de431>] ? nfsd4_client_tracking_init+0x51/0x130 [nfsd] [ 2996.631086] [<ffffffffa02d8a62>] ? nfs4_state_start_net+0x2a2/0x340 [nfsd]
[ 2996.631091]  [<ffffffffa02b3b20>] ? nfsd_svc+0x1d0/0x330 [nfsd]
[ 2996.631097]  [<ffffffffa02b4600>] ? write_pool_threads+0x260/0x260 [nfsd]
[ 2996.631102]  [<ffffffffa02b468a>] ? write_threads+0x8a/0xf0 [nfsd]
[ 2996.631105]  [<ffffffff8113ecca>] ? __get_free_pages+0xa/0x50
[ 2996.631107]  [<ffffffff811ca5e0>] ? simple_transaction_get+0xa0/0xc0
[ 2996.631112] [<ffffffffa02b4093>] ? nfsctl_transaction_write+0x43/0x70 [nfsd]
[ 2996.631116]  [<ffffffff811a52f2>] ? vfs_write+0xb2/0x1f0
[ 2996.631118]  [<ffffffff811a5e32>] ? SyS_write+0x42/0xa0
[ 2996.631121] [<ffffffff8150c26d>] ? system_call_fast_compare_end+0x10/0x15
[ 2996.631123] ---[ end trace 61dda43e27c71f63 ]---
[ 2996.631125] NFSD: Unable to initialize client recovery tracking! (-22)
[ 2996.631127] NFSD: starting 90-second grace period (net ffff8801b6bfa0c0)
----------------------end trace -----------------------------

Last information, ....the host container (Debian Jessie) runs also one NFS client daemon, so i suspect perhapŝ one problem in sysfs on name space code ?

best regards

--
--------------------------------------
 -- Jean-Marc LACROIX                 --
  -- mailto : jeanmarc.lacroix@free.fr --
    ---------------------------------------


Reply to: