[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#847549: kernel bug dcache.c 2373 invalid opcode 0000 d_materialise_unique



Package: linux
Version: 3.16.36-1+deb8u1
Severity: important

The system is an NFS server running linux-image-3.16.0-4-amd64

At times of heavy load on NFS, such as "git checkout some-branch" in a
large repository, the system crashes (dmesg output attached).  It has
been happening regularly since the upgrade to jessie and with every
kernel released through stable updates.  I don't recall seeing this in
wheezy.

I've installed kdump-tools.  Sometimes it captures the dmesg output, a
recent example from a crash on 2016-12-02 is attached.  I'm not sure if
the crashes without /var/crash logs are the same bug.

The same crash was reported[1] on linux-fsdevel by another Debian user.

I don't mind trying a backports kernel as a workaround, but can anybody
comment on whether the backports kernel 4.7.8-1~bpo8+1 will match with
the NFS user space packages in jessie, or do I need to update some of
those packages too?

Having seen various crashes without stack dumps, I also tried checking
the RAM in the machine.  memtest86+ did not report any errors, either
using 8GB (upgraded) or the original 2GB from the vendor.  The machine
had 8GB when it crashed on 2016-12-02 and it had only 2GB when it
crashed again 2016-12-07

Regards,

Daniel


1. http://www.spinics.net/lists/linux-fsdevel/msg98540.html
[322559.833066] ------------[ cut here ]------------
[322559.833185] kernel BUG at /build/linux-EZT6bx/linux-3.16.36/fs/dcache.c:2373!
[322559.833346] invalid opcode: 0000 [#1] SMP 
[322559.833449] Modules linked in: cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_userspace 8021q garp stp mrp llc binfmt_misc xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key xfr
m_algo nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc evdev radeon kvm_amd kvm ttm drm_kms_helper drm edac_mce_amd edac_core k10temp i2c_algo_bit pcspkr shpchp acpi_cpufreq sp5100_tco tpm_infineon tpm_tis tpm button i2c_piix4 i2c_core processor thermal_sys loop fuse parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 crc32c_generic btrfs xor raid6_pq hid_generic usbhid hid dm_mod raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic usb_storage ohci_pci pata_atiixp ahci libahci ehci_pci tg3 ohci_hcd ehci_hcd ptp pps_core libphy libata scsi_mod
[322559.835542]  usbcore usb_common
[322559.835598] CPU: 0 PID: 1762 Comm: nfsd Not tainted 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u1
[322559.835787] Hardware name: HP ProLiant MicroServer, BIOS O41     01/17/2011
[322559.835944] task: ffff8800d5ab0ca0 ti: ffff88020e388000 task.ti: ffff88020e388000
[322559.836110] RIP: 0010:[<ffffffff811c01c3>]  [<ffffffff811c01c3>] __d_rehash+0x53/0x60
[322559.836298] RSP: 0018:ffff88020e38bb58  EFLAGS: 00010282
[322559.836416] RAX: 00000000000eb87f RBX: ffff8800a2471990 RCX: 000000000000000c
[322559.836575] RDX: ffffc90000010000 RSI: ffffc9000076c3f8 RDI: ffff8800b9ba3498
[322559.836733] RBP: ffff8800b9ba3498 R08: ffff8800b9ba3410 R09: 6cf0c77314797b26
[322559.836892] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800b9ba33d8
[322559.837052] R13: ffff8800b9ba34f0 R14: ffff88001ebfb6d8 R15: ffff88020e38bc58
[322559.837212] FS:  00007f376c212700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[322559.837389] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[322559.837518] CR2: 00007f98951a5650 CR3: 0000000215be1000 CR4: 00000000000007f0
[322559.837676] Stack:
[322559.837723]  ffffffff811c1bac ffff88001ebfb6d8 ffff8800b9ba33d8 ffff88001ebfb6d8
[322559.837911]  ffff88001ebfb6d8 ffff880214978560 ffff88001ebfb6d8 ffff88020e38bc58
[322559.838097]  ffffffff811b3739 ffff880214978560 0000000000000000 ffffffff811b3fcf
[322559.838284] Call Trace:
[322559.838348]  [<ffffffff811c1bac>] ? d_materialise_unique+0x7c/0x3b0
[322559.838493]  [<ffffffff811b3739>] ? lookup_real+0x19/0x50
[322559.838617]  [<ffffffff811b3fcf>] ? __lookup_hash+0x2f/0x40
[322559.838744]  [<ffffffff811b6f5d>] ? lookup_one_len+0xcd/0x120
[322559.838877]  [<ffffffff8121f78a>] ? reconnect_path+0x10a/0x2c0
[322559.839021]  [<ffffffffa070f7a0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[322559.839173]  [<ffffffff8121fa7f>] ? exportfs_decode_fh+0xef/0x2c0
[322559.839319]  [<ffffffffa07147e6>] ? expkey_match+0x36/0x40 [nfsd]
[322559.839470]  [<ffffffffa0549f04>] ? cache_check+0xf4/0x3b0 [sunrpc]
[322559.839620]  [<ffffffffa0715191>] ? exp_find+0xe1/0x190 [nfsd]
[322559.839754]  [<ffffffff810a45f1>] ? pick_next_task_fair+0x6e1/0x820
[322559.839897]  [<ffffffff8101255e>] ? __switch_to+0xde/0x5a0
[322559.840028]  [<ffffffffa0710415>] ? fh_verify+0x2e5/0x5c0 [nfsd]
[322559.840173]  [<ffffffffa05433e0>] ? unix_gid_show+0x110/0x110 [sunrpc]
[322559.840328]  [<ffffffffa0719f43>] ? nfsd3_proc_getattr+0x63/0xe0 [nfsd]
[322559.840484]  [<ffffffffa070cd32>] ? nfsd_dispatch+0xb2/0x200 [nfsd]
[322559.840635]  [<ffffffffa053fd3b>] ? svc_process_common+0x41b/0x670 [sunrpc]
[322559.840798]  [<ffffffffa070c630>] ? nfsd_destroy+0x70/0x70 [nfsd]
[322559.840945]  [<ffffffffa054009c>] ? svc_process+0x10c/0x160 [sunrpc]
[322559.841097]  [<ffffffffa070c6ef>] ? nfsd+0xbf/0x130 [nfsd]
[322559.841227]  [<ffffffff810894bd>] ? kthread+0xbd/0xe0
[322559.841344]  [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180
[322559.841492]  [<ffffffff81518498>] ? ret_from_fork+0x58/0x90
[322559.841625]  [<ffffffff81089400>] ? kthread_create_on_node+0x180/0x180
[322559.841770] Code: 89 47 08 74 04 48 89 50 08 48 89 77 10 48 83 ca 01 48 89 16 0f ba 36 00 c3 0f 1f 80 00 00 00 00 f3 90 48 8b 06 a8 01 75 f7 eb b9 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 
[322559.842589] RIP  [<ffffffff811c01c3>] __d_rehash+0x53/0x60
[322559.842717]  RSP <ffff88020e38bb58>


Reply to: