Why do NBD requests prevent hibernation, and FUSE requests do not?
Hello,
I am comparing the behavior of FUSE and NBD when attempting to hibernate
the system.
FUSE seems to be mostly compatible, I am able to suspend the system even
when there is ongoing I/O on the fuse filesystem.
With NBD, on the other hand, most I/O seems to prevent hibernation the
system. Example hibernation error:
kernel: Freezing user space processes ...
kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
kernel: task:rsync state:D stack: 0 pid:348105 ppid:348104 flags:0x00004004
kernel: Call Trace:
kernel: <TASK>
kernel: __schedule+0x308/0x9e0
kernel: schedule+0x4e/0xb0
kernel: schedule_timeout+0x88/0x150
kernel: ? __bpf_trace_tick_stop+0x10/0x10
kernel: io_schedule_timeout+0x4c/0x80
kernel: __cv_timedwait_common+0x129/0x160 [spl]
kernel: ? dequeue_task_stop+0x70/0x70
kernel: __cv_timedwait_io+0x15/0x20 [spl]
kernel: zio_wait+0x129/0x2b0 [zfs]
kernel: dmu_buf_hold+0x5b/0x90 [zfs]
kernel: zap_lockdir+0x4e/0xb0 [zfs]
kernel: zap_cursor_retrieve+0x1ae/0x320 [zfs]
kernel: ? dbuf_prefetch+0xf/0x20 [zfs]
kernel: ? dmu_prefetch+0xc8/0x200 [zfs]
kernel: zfs_readdir+0x12a/0x440 [zfs]
kernel: ? preempt_count_add+0x68/0xa0
kernel: ? preempt_count_add+0x68/0xa0
kernel: ? aa_file_perm+0x120/0x4c0
kernel: ? rrw_exit+0x65/0x150 [zfs]
kernel: ? _copy_to_user+0x21/0x30
kernel: ? cp_new_stat+0x150/0x180
kernel: zpl_iterate+0x4c/0x70 [zfs]
kernel: iterate_dir+0x171/0x1c0
kernel: __x64_sys_getdents64+0x78/0x110
kernel: ? __ia32_sys_getdents64+0x110/0x110
kernel: do_syscall_64+0x38/0xc0
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f03c897a9c7
kernel: RSP: 002b:00007ffd41e3c518 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
kernel: RAX: ffffffffffffffda RBX: 0000561eff64dd40 RCX: 00007f03c897a9c7
kernel: RDX: 0000000000008000 RSI: 0000561eff64dd70 RDI: 0000000000000000
kernel: RBP: 0000561eff64dd70 R08: 0000000000000030 R09: 00007f03c8a72be0
kernel: R10: 0000000000020000 R11: 0000000000000293 R12: ffffffffffffff80
kernel: R13: 0000561eff64dd44 R14: 0000000000000000 R15: 0000000000000001
kernel: </TASK>
(this is with ZFS on top of the NBD device).
As far as I can tell, the problem is that while an NBD request is
pending, the atsk that waits for the result (in this case *rsync*) is
refusing to freeze. This happens even when setting a 5 minute timeout
for freezing (which is more than enough time for the NBD request to
complete), so I suspect that the NBD server task (in this case nbdkit)
has already been frozen and is thus unable to make progress.
However, I do not understand why the same is not happening for FUSE
(with FUSE requests being stuck because the FUSE daemon is already
frozen). Was I just very lucky in my tests? Or are tasks waiting for
FUSE request in a different kind of state? Or is NBD a red-herring here,
and the real trouble is with ZFS?
It would be great if someone could shed some light on what's going on.
Best,
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
Reply to: