[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

jbd2 tid wrap seen on NFS server



On Sat, 2013-03-16 at 16:12 +1100, George Barnett wrote:
> Hi, 
> 
> We use debian for a number of machines in our storage infrastructure
> and we have recently been seeing a number of "hangs". We primary
> notice this by seeing nfsd processes locking up and then a hung task
> killer going wild. We finally managed to get a trace last night - its
> pasted below: 
> 
> We did not see this crash under 2.6.39 back port however this kernel
> spontaneously rebooted at ~200 days uptime (we had about 3/4 of our
> infra reboot in a few weeks. It was not a good time for our ops
> teams). 
> 
> I would be grateful if anybody who could help me narrow this down
> would jump in and help with requests for further info, or provide
> further advice. 

There was some discussion on the ext4 development list in December about
why this warning can appear and how that should be dealt.  It didn't
seem like this actually got resolved though.  I'm cc'ing that list in
the hope of either reviving that discussion or finding out what the fix
is.

Ben.

> [11309697.466397] ------------[ cut here ]------------ 
> [11309697.466556] WARNING: at /build/buildd-linux_3.2.23-1~bpo60+2-amd64-oLufer/linux-3.2.23/fs/jbd2/journal.c:507 __jbd2_log_start_commit+0x7e/0x8c [jbd2]()
> [11309697.466660] Hardware name: X8DT6
> [11309697.466728] JBD2: bad log_start_commit: 2205591757 2205591757 14613566 0
> [11309697.466798] Modules linked in: netconsole autofs4 8021q garp bridge stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding tcp_htcp ext4 jbd2 crc16 configfs loop ohci_hcd tpm_tis tpm i7core_edac i2c_i801 snd_pcm snd_timer snd soundcore edac_core i2c_core ioatdma tpm_bios snd_page_alloc coretemp crc32c_intel psmouse pcspkr joydev evdev acpi_cpufreq mperf processor serio_raw button thermal_sys ext3 jbd mbcache usbhid hid sd_mod ses enclosure crc_t10dif uhci_hcd ahci libahci libata igb ehci_hcd e1000e usbcore dca megaraid_sas usb_common scsi_mod [last unloaded: netconsole]
> [11309697.470190] Pid: 62, comm: kswapd0 Not tainted 3.2.0-0.bpo.3-amd64 #1
> [11309697.470261] Call Trace:
> [11309697.470329] [<ffffffff810498a8>] ? warn_slowpath_common+0x78/0x8c
> [11309697.470399] [<ffffffff8104995a>] ? warn_slowpath_fmt+0x45/0x4a
> [11309697.470471] [<ffffffffa01cabad>] ? __jbd2_log_start_commit+0x7e/0x8c [jbd2]
> [11309697.470558] [<ffffffffa01cac83>] ? jbd2_log_start_commit+0x21/0x2f [jbd2]
> [11309697.470634] [<ffffffffa02dee7a>] ? ext4_evict_inode+0x86/0x2d1 [ext4]
> [11309697.470707] [<ffffffff81119626>] ? evict+0x9a/0x14e
> [11309697.470775] [<ffffffff811198b4>] ? dispose_list+0x35/0x3f
> [11309697.470844] [<ffffffff81119b87>] ? prune_icache_sb+0x2c9/0x2d8
> [11309697.470915] [<ffffffff811081b0>] ? prune_super+0xd6/0x147
> [11309697.470987] [<ffffffff810cb9e2>] ? shrink_slab+0x1a3/0x266
> [11309697.471056] [<ffffffff810cd937>] ? balance_pgdat+0x335/0x625
> [11309697.471126] [<ffffffff810cdf31>] ? kswapd+0x30a/0x325
> [11309697.471196] [<ffffffff81063815>] ? wake_up_bit+0x20/0x20
> [11309697.471265] [<ffffffff810cdc27>] ? balance_pgdat+0x625/0x625
> [11309697.471334] [<ffffffff810cdc27>] ? balance_pgdat+0x625/0x625
> [11309697.471403] [<ffffffff810633d9>] ? kthread+0x7a/0x82
> [11309697.471472] [<ffffffff8136d3f4>] ? kernel_thread_helper+0x4/0x10
> [11309697.471543] [<ffffffff8106335f>] ? kthread_worker_fn+0x147/0x147
> [11309697.471613] [<ffffffff8136d3f0>] ? gs_change+0x13/0x13
> [11309697.471680] ---[ end trace 56d2be5ea52d0917 ]---

-- 
Ben Hutchings
It is easier to change the specification to fit the program than vice versa.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: