[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#837907: marked as done (stat() hangs on a particular file)



Your message dated Mon, 10 May 2021 09:36:41 +0200
with message-id <20210510073641.GA1289@lorien.valinor.li>
and subject line Re: Bug#837907: stat() hangs on a particular file
has caused the Debian Bug report #837907,
regarding stat() hangs on a particular file
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
837907: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837907
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-3.16.0-4-amd64
Version: 3.16.36-1+deb8u1

One of our systems is suddenly unable to stat() a particular file
(cookies.sqlite-wal in a user's Firefox profile). Any attempt to
do so hangs in the system call, as shown by strace. The file resides
on an NFSv4 share (sec=krb5p). Other files in the same directory on
the same share remain accessible. The affected file is normally accessible
on the NFS server and from other NFS clients running the same kernel.

The user has reported a similar incident yesterday on some directories
on a different NFS share (also sec=krb5p, but hosted on a different
server). He rebooted to clear up the problem.

I'd like advice on how to troubleshoot this effectively. I've tried
rpcdebug -m {nfs,rpc,nlm} -s all but didn't see any smoking gun; maybe
some information cached by the kernel is suppressing NFS activity
associated with the stat() calls. The log entries I do see say
NFS: nfs_lookup_revalidate(cookies.sqlite-wal) is valid

Some related kernel traces from this system's logs (in chronological order,
with an intervening reboot; the first trace is associated with yesterday's
incident, the second trace is 2-3 minutes newer than the timestamp on
cookies.sqlite-wal):

[97483.663949] INFO: task ls:23767 blocked for more than 120 seconds.
[97483.663951]       Tainted: P        W  O  3.16.0-4-amd64 #1
[97483.663952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[97483.663954] ls              D ffff8800afd3b808     0 23767      1 0x00000004
[97483.663957]  ffff8800afd3b3b0 0000000000000086 0000000000012f40 ffff88039a057fd8
[97483.663960]  0000000000012f40 ffff8800afd3b3b0 ffff88041ea137f0 ffff88041edcd128
[97483.663962]  0000000000000002 ffffffff8113eb50 ffff88039a057d60 ffff88039a057e40
[97483.663965] Call Trace:
[97483.663968]  [<ffffffff8113eb50>] ? wait_on_page_read+0x60/0x60
[97483.663971]  [<ffffffff81515069>] ? io_schedule+0x99/0x120
[97483.663974]  [<ffffffff8113eb5a>] ? sleep_on_page+0xa/0x10
[97483.663977]  [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90
[97483.663980]  [<ffffffff8113e956>] ? wait_on_page_bit+0xc6/0xd0
[97483.663983]  [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30
[97483.663986]  [<ffffffff8114c22d>] ? pagevec_lookup_tag+0x1d/0x30
[97483.663989]  [<ffffffff8113ea30>] ? filemap_fdatawait_range+0xd0/0x160
[97483.663993]  [<ffffffff81140656>] ? filemap_write_and_wait+0x36/0x50
[97483.664002]  [<ffffffffa06c4c08>] ? nfs_getattr+0x108/0x220 [nfs]
[97483.664005]  [<ffffffff811af057>] ? vfs_fstatat+0x57/0x90
[97483.664009]  [<ffffffff811af51d>] ? SYSC_newlstat+0x1d/0x40
[97483.664013]  [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15

[ 9724.415533] INFO: task mozStorage #5:2748 blocked for more than 120 seconds.
[ 9724.415537]       Tainted: P        W  O 3.16.0-4-amd64 #1
[ 9724.415538] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9724.415539] mozStorage #5   D ffff8803ee153a88     0  2748   2323 0x00000000
[ 9724.415542]  ffff8803ee153630 0000000000000082 0000000000012f40 ffff8803ee2cffd8
[ 9724.415544]  0000000000012f40 ffff8803ee153630 ffff88041ea537f0 ffff88041edaf7a0
[ 9724.415545]  0000000000000002 ffffffffa0669800 ffff8803ee2cfc30 ffff88040b1152c0
[ 9724.415547] Call Trace:
[ 9724.415557]  [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs]
[ 9724.415560]  [<ffffffff81515069>] ? io_schedule+0x99/0x120
[ 9724.415563]  [<ffffffffa066980a>] ? nfs_wait_bit_uninterruptible+0xa/0x10 [nfs]
[ 9724.415566]  [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90
[ 9724.415568]  [<ffffffff810738ba>] ? internal_add_timer+0x2a/0x70
[ 9724.415571]  [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs]
[ 9724.415573]  [<ffffffff81515497>] ? out_of_line_wait_on_bit+0x77/0x90
[ 9724.415575]  [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30
[ 9724.415578]  [<ffffffffa066ef2e>] ? nfs_updatepage+0x15e/0x830 [nfs]
[ 9724.415582]  [<ffffffffa06609a7>] ? nfs_write_end+0x57/0x320 [nfs]
[ 9724.415585]  [<ffffffff81164d25>] ? iov_iter_copy_from_user_atomic+0x75/0x190
[ 9724.415588]  [<ffffffff8113f08b>] ? generic_perform_write+0x11b/0x1c0
[ 9724.415590]  [<ffffffff81140f98>] ? __generic_file_write_iter+0x158/0x340
[ 9724.415592]  [<ffffffff811411b9>] ? generic_file_write_iter+0x39/0xa0
[ 9724.415595]  [<ffffffffa065fdb3>] ? nfs_file_write+0x83/0x1a0 [nfs]
[ 9724.415598]  [<ffffffff811aa134>] ? new_sync_write+0x74/0xa0
[ 9724.415600]  [<ffffffff811aa872>] ? vfs_write+0xb2/0x1f0
[ 9724.415601]  [<ffffffff811ab3b2>] ? SyS_write+0x42/0xa0
[ 9724.415603]  [<ffffffff811aa393>] ? SyS_lseek+0x43/0xa0
[ 9724.415605]  [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15

--- End Message ---
--- Begin Message ---
Hi Sergio,

On Mon, May 10, 2021 at 09:13:36AM +0200, Sergio Gelato wrote:
> * Salvatore Bonaccorso [2021-05-09 22:37:02 +0200]:
> > Control: tags -1 + moreinfo
> > 
> > Do you still can reproduce the issue with a recent kernel?
> 
> No, I haven't seen it in a while (and never with 4.9 or 4.19 either; nor with
> 5.10 but that's statistically less significant).

Thanks for reporting back. So let's close the issue given all of the
above versions are the current supported one either in regular support
or LTS.

Regards,
Salvatore

--- End Message ---

Reply to: