[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#731439: linux: unable to handle kernel paging request at ffffffffffffffb8 when trying to remove a file in a checkpoint directory of a NFSv4 mount from a EMC VNx Storage



Package: src:linux
Version: 3.2.51-1
Severity: normal

Hi

We found the following by chance: We are mounting a NFSv4 export from
a EMC VNx storage.

There are checkpoints, which are read-only, wereas the remaining part of
the volume is read-write.

When by accident trying to remove a file in such a checkpoint tree, we
see the kernel oops (in our regular setup we also set
kernel.panic_on_oops = 1 and kernel.panic = 60, so that the machine
reboots):

----cut---------cut---------cut---------cut---------cut---------cut-----
[592812.122561] BUG: unable to handle kernel paging request at ffffffffffffffb8
[592812.126500] IP: [<ffffffffa02d4cf1>] nfs_mark_delegation_referenced+0x6/0x6 [nfs]
[592812.126500] PGD 1607067 PUD 1608067 PMD 0
[592812.126500] Oops: 0000 [#1] SMP
[592812.126500] CPU 1
[592812.126500] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables autofs4 binfmt_misc nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop powernow_k8 psmouse nv_tco mperf pcspkr evdev serio_raw amd64_edac_mod edac_mce_amd i2c_nforce2 k8temp shpchp i2c_core edac_core processor button thermal_sys ext4 crc16 jbd2 mbcache raid1 md_mod microcode sg sd_mod crc_t10dif usbhid sr_mod cdrom hid ata_generic pata_amd ohci_hcd sata_nv libata tg3 ehci_hcd libphy scsi_mod usbcore forcedeth usb_common [last unloaded: scsi_wait_scan]
[592812.126500]
[592812.126500] Pid: 25914, comm: rm Not tainted 3.2.0-4-amd64 #1 Debian 3.2.51-1 Sun Microsystems Sun Fire X2100 M2/S40
[592812.126500] RIP: 0010:[<ffffffffa02d4cf1>]  [<ffffffffa02d4cf1>] nfs_mark_delegation_referenced+0x6/0x6 [nfs]
[592812.126500] RSP: 0018:ffff880197979dc0  EFLAGS: 00010246
[592812.126500] RAX: 00000000ffffd8ca RBX: 00000000ffffd8ca RCX: 0000000000000000
[592812.126500] RDX: ffff880197979e08 RSI: 0000000000000001 RDI: 0000000000000000
[592812.126500] RBP: ffff880197979e08 R08: 0000000000000000 R09: dead000000200200
[592812.126500] R10: ffff8801988e4000 R11: ffff8801988e4000 R12: ffff88019759c800
[592812.126500] R13: ffff880197a7d400 R14: 0000000000000000 R15: 0000000000000000
[592812.126500] FS:  00007f879debb700(0000) GS:ffff88019fd00000(0000) knlGS:0000000000000000
[592812.126500] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[592813.924021] CR2: ffffffffffffffb8 CR3: 00000001969a1000 CR4: 00000000000006e0
[592813.924021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[592813.924021] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[592813.924021] Process rm (pid: 25914, threadinfo ffff880197978000, task ffff88019531c280)
[592813.924021] Stack:
[592813.924021]  ffffffffa02c5d05 dead000000200200 ffff880123a669d0 ffff880123a79e20
[592813.924021]  ffff8801444689d0 ffff880123a669d0 0000000000000001 0000000000000000
[592813.924021]  ffffffffa02c6b77 0000000000000000 0000000000000000 0000000000000000
[592813.924021] Call Trace:
[592813.924021]  [<ffffffffa02c5d05>] ? nfs4_handle_exception+0x147/0x2c2 [nfs]
[592813.924021]  [<ffffffffa02c6b77>] ? nfs4_proc_remove+0x38/0x46 [nfs]
[592813.924021]  [<ffffffffa02ae8da>] ? nfs_unlink+0xef/0x18a [nfs]
[592813.924021]  [<ffffffff81104890>] ? vfs_unlink+0x68/0xbb
[592813.924021]  [<ffffffff8110578a>] ? do_unlinkat+0xd0/0x156
[592813.924021]  [<ffffffff810958cb>] ? __call_rcu+0x21/0x12c
[592813.924021]  [<ffffffff810f98c4>] ? sys_faccessat+0x145/0x155
[592813.924021]  [<ffffffff81354212>] ? system_call_fastpath+0x16/0x1b
[592813.924021] Code: df e8 78 4b ff ff 48 89 ef 89 44 24 08 e8 d8 fc ff ff 8b 44 24 08 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f c3 f0 80 4f 48 04 c3 <48> 8b 7f b8 31 c0 48 85 ff 74 16 8b 57 30 83 e6 03 21 f2 39 f2
[592813.924021] RIP  [<ffffffffa02d4cf1>] nfs_mark_delegation_referenced+0x6/0x6 [nfs]
[592813.924021]  RSP <ffff880197979dc0>
[592813.924021] CR2: ffffffffffffffb8
[592813.924021] ---[ end trace 6e2186ee7d0814e0 ]---
----cut---------cut---------cut---------cut---------cut---------cut-----

This does not seem to happen with a 3.10.11-1~bpo70+1 kernel from
wheezy-backports, where we "correctly" get:

rm: cannot remove `foo': Input/output error

[I unfortunately cannot easily reproduce otherwise and was not able to
identify a commit which changed this.]

Regards,
Salvatore


Reply to: