[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#837907: stat() hangs on a particular file



Package: linux-image-3.16.0-4-amd64
Version: 3.16.36-1+deb8u1

One of our systems is suddenly unable to stat() a particular file
(cookies.sqlite-wal in a user's Firefox profile). Any attempt to
do so hangs in the system call, as shown by strace. The file resides
on an NFSv4 share (sec=krb5p). Other files in the same directory on
the same share remain accessible. The affected file is normally accessible
on the NFS server and from other NFS clients running the same kernel.

The user has reported a similar incident yesterday on some directories
on a different NFS share (also sec=krb5p, but hosted on a different
server). He rebooted to clear up the problem.

I'd like advice on how to troubleshoot this effectively. I've tried
rpcdebug -m {nfs,rpc,nlm} -s all but didn't see any smoking gun; maybe
some information cached by the kernel is suppressing NFS activity
associated with the stat() calls. The log entries I do see say
NFS: nfs_lookup_revalidate(cookies.sqlite-wal) is valid

Some related kernel traces from this system's logs (in chronological order,
with an intervening reboot; the first trace is associated with yesterday's
incident, the second trace is 2-3 minutes newer than the timestamp on
cookies.sqlite-wal):

[97483.663949] INFO: task ls:23767 blocked for more than 120 seconds.
[97483.663951]       Tainted: P        W  O  3.16.0-4-amd64 #1
[97483.663952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[97483.663954] ls              D ffff8800afd3b808     0 23767      1 0x00000004
[97483.663957]  ffff8800afd3b3b0 0000000000000086 0000000000012f40 ffff88039a057fd8
[97483.663960]  0000000000012f40 ffff8800afd3b3b0 ffff88041ea137f0 ffff88041edcd128
[97483.663962]  0000000000000002 ffffffff8113eb50 ffff88039a057d60 ffff88039a057e40
[97483.663965] Call Trace:
[97483.663968]  [<ffffffff8113eb50>] ? wait_on_page_read+0x60/0x60
[97483.663971]  [<ffffffff81515069>] ? io_schedule+0x99/0x120
[97483.663974]  [<ffffffff8113eb5a>] ? sleep_on_page+0xa/0x10
[97483.663977]  [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90
[97483.663980]  [<ffffffff8113e956>] ? wait_on_page_bit+0xc6/0xd0
[97483.663983]  [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30
[97483.663986]  [<ffffffff8114c22d>] ? pagevec_lookup_tag+0x1d/0x30
[97483.663989]  [<ffffffff8113ea30>] ? filemap_fdatawait_range+0xd0/0x160
[97483.663993]  [<ffffffff81140656>] ? filemap_write_and_wait+0x36/0x50
[97483.664002]  [<ffffffffa06c4c08>] ? nfs_getattr+0x108/0x220 [nfs]
[97483.664005]  [<ffffffff811af057>] ? vfs_fstatat+0x57/0x90
[97483.664009]  [<ffffffff811af51d>] ? SYSC_newlstat+0x1d/0x40
[97483.664013]  [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15

[ 9724.415533] INFO: task mozStorage #5:2748 blocked for more than 120 seconds.
[ 9724.415537]       Tainted: P        W  O 3.16.0-4-amd64 #1
[ 9724.415538] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9724.415539] mozStorage #5   D ffff8803ee153a88     0  2748   2323 0x00000000
[ 9724.415542]  ffff8803ee153630 0000000000000082 0000000000012f40 ffff8803ee2cffd8
[ 9724.415544]  0000000000012f40 ffff8803ee153630 ffff88041ea537f0 ffff88041edaf7a0
[ 9724.415545]  0000000000000002 ffffffffa0669800 ffff8803ee2cfc30 ffff88040b1152c0
[ 9724.415547] Call Trace:
[ 9724.415557]  [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs]
[ 9724.415560]  [<ffffffff81515069>] ? io_schedule+0x99/0x120
[ 9724.415563]  [<ffffffffa066980a>] ? nfs_wait_bit_uninterruptible+0xa/0x10 [nfs]
[ 9724.415566]  [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90
[ 9724.415568]  [<ffffffff810738ba>] ? internal_add_timer+0x2a/0x70
[ 9724.415571]  [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs]
[ 9724.415573]  [<ffffffff81515497>] ? out_of_line_wait_on_bit+0x77/0x90
[ 9724.415575]  [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30
[ 9724.415578]  [<ffffffffa066ef2e>] ? nfs_updatepage+0x15e/0x830 [nfs]
[ 9724.415582]  [<ffffffffa06609a7>] ? nfs_write_end+0x57/0x320 [nfs]
[ 9724.415585]  [<ffffffff81164d25>] ? iov_iter_copy_from_user_atomic+0x75/0x190
[ 9724.415588]  [<ffffffff8113f08b>] ? generic_perform_write+0x11b/0x1c0
[ 9724.415590]  [<ffffffff81140f98>] ? __generic_file_write_iter+0x158/0x340
[ 9724.415592]  [<ffffffff811411b9>] ? generic_file_write_iter+0x39/0xa0
[ 9724.415595]  [<ffffffffa065fdb3>] ? nfs_file_write+0x83/0x1a0 [nfs]
[ 9724.415598]  [<ffffffff811aa134>] ? new_sync_write+0x74/0xa0
[ 9724.415600]  [<ffffffff811aa872>] ? vfs_write+0xb2/0x1f0
[ 9724.415601]  [<ffffffff811ab3b2>] ? SyS_write+0x42/0xa0
[ 9724.415603]  [<ffffffff811aa393>] ? SyS_lseek+0x43/0xa0
[ 9724.415605]  [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15


Reply to: