[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#645366: [fuse-devel] Hang and suspend failure after FUSE server killed (3.1-rc7)



Ben Hutchings <ben@decadent.org.uk> writes:

> On Fri, 2011-10-14 at 22:52 +0000, brian m. carlson wrote:
>> Package: linux-2.6
>> Version: 3.1.0~rc7-1~experimental.1
>> Severity: normal
>> 
>> This morning I was backing up my laptop to another computer via sshfs
>> (and fuse).  The afio archiver was writing to this sshfs-mounted
>> location.  I decided to abort the operation with Ctrl-C, which caused
>> the sshfs mount to become unmounted; however, afio was apparently not
>> affected by the SIGINT (probably because processes in disk IO are
>> unkillable).
>> 
>> Several hours later, I attempted to suspend my computer and it failed to
>> do so. The kernel log (attached) indicated that the afio process from
>> hours before was preventing the suspend.  Since processes waiting on
>> disk IO are unkillable (IMO a bug) and the underlying device to which
>> afio was writing was long gone, I was forced to reboot the machine in
>> order to get it to suspend.  If I had not noticed that the machine had
>> failed to suspend, it could have stayed running in my bag and seriously
>> overheated.
>
> This seems to be a bug in FUSE.  Is this known about?  If not, could
> someone look into this?

It's a bug in the fuse-freezer interaction.  Yes, it is known.

Before suspending the machine all userspace task are frozen, which means
the freezer will wait until they exit the kernel (i.e. finish any system
calls).  If some task does not exit the kernel within a predefined time
then the freezer will give up and not let the machine be suspended.

Lets say task A is executing a system call that depends on task B to
finish.  In this case task B must not be frozen until task A is frozen
otherwise the suspend will be unsuccessful.

One often proposed solution is to try to order the freezing of userspace
tasks and leave "task B's" last.  The problem is that it's impossible to
know which task depends on which other task to be able to make progress.
For example the kernel could guess that "sshfs" is probably "task B"
type because it's reading and writing /dev/fuse.  But it's not going to
guess that a certain "ssh" process is also a "task B".  This is also
complicated by the fact that a task could be "task A" and "task B" at
the same time...

Another suggested solution is to allow freezing of tasks that are
waiting for a fuse reply.  E.g:

  http://thread.gmane.org/gmane.linux.power-management.general/25926

However that would only fix a subset of the problems as described in
that thread.  Also it would disrupt the operation of the freezer in
cases where it actually needs the userspace task to be out of the kernel
(cgroup freezer).

We discussed this issue recently with Rafael Wysocki, the power
management maintainer, and came to the conclusion that the best solution
is to allow suspend to go ahead even if some tasks are not frozen.  But
we need to be careful about only allowing tasks to remain unfrozen if
they are known to be outside of driver code.  For example we can mark
the task safe to suspend if it's inside any "well behaved" filesystem
(block filesystems, fuse, NFS, etc).

One important implementation question is: how to do this marking of
"safe" tasks without adding too much runtime and maintenance overhead to
the kernel.

Ideas, patches are welcome.

Thanks,
Miklos

>
> Ben.
>
> [...]
>> Oct 14 12:50:07 lakeview kernel: [129960.588174] INFO: task afio:22818 blocked for more than 120 seconds.
>> Oct 14 12:50:07 lakeview kernel: [129960.588182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 14 12:50:07 lakeview kernel: [129960.588188] afio            D ffff880086e20300     0 22818      1 0x00000084
>> Oct 14 12:50:07 lakeview kernel: [129960.588199]  ffff880086e20300 0000000000000086 ffff8800065c1848 ffffffff81037a71
>> Oct 14 12:50:07 lakeview kernel: [129960.588210]  ffff88003687f120 0000000000012f00 ffff8800001effd8 ffff8800001effd8
>> Oct 14 12:50:07 lakeview kernel: [129960.588220]  0000000000012f00 ffff880086e20300 0000000000012f00 0000000000012f00
>> Oct 14 12:50:07 lakeview kernel: [129960.588231] Call Trace:
>> Oct 14 12:50:07 lakeview kernel: [129960.588246]  [<ffffffff81037a71>] ? __wake_up_common+0x41/0x78
>> Oct 14 12:50:07 lakeview kernel: [129960.588257]  [<ffffffff81344bb4>] ? _raw_spin_lock_irqsave+0x9/0x25
>> Oct 14 12:50:07 lakeview kernel: [129960.588282]  [<ffffffffa0577ab3>] ? fuse_request_send+0x1a2/0x251 [fuse]
>> Oct 14 12:50:07 lakeview kernel: [129960.588291]  [<ffffffff8106288b>] ? wake_up_bit+0x23/0x23
>> Oct 14 12:50:07 lakeview kernel: [129960.588316]  [<ffffffffa057dd2f>] ? fuse_flush+0xca/0xfe [fuse]
>> Oct 14 12:50:07 lakeview kernel: [129960.588322]  [<ffffffff810fcae7>] ? filp_close+0x3b/0x6a
>> Oct 14 12:50:07 lakeview kernel: [129960.588326]  [<ffffffff810fcb9d>] ? sys_close+0x87/0xc4
>> Oct 14 12:50:07 lakeview kernel: [129960.588331]  [<ffffffff81349e52>] ? system_call_fastpath+0x16/0x1b
> [...]



Reply to: