[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Why do NBD requests prevent hibernation, and FUSE requests do not?



Hello,

I am comparing the behavior of FUSE and NBD when attempting to hibernate
the system.

FUSE seems to be mostly compatible, I am able to suspend the system even
when there is ongoing I/O on the fuse filesystem.

With NBD, on the other hand, most I/O seems to prevent hibernation the
system. Example hibernation error:

  kernel: Freezing user space processes ... 
  kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
  kernel: task:rsync           state:D stack:    0 pid:348105 ppid:348104 flags:0x00004004
  kernel: Call Trace:
  kernel:  <TASK>
  kernel:  __schedule+0x308/0x9e0
  kernel:  schedule+0x4e/0xb0
  kernel:  schedule_timeout+0x88/0x150
  kernel:  ? __bpf_trace_tick_stop+0x10/0x10
  kernel:  io_schedule_timeout+0x4c/0x80
  kernel:  __cv_timedwait_common+0x129/0x160 [spl]
  kernel:  ? dequeue_task_stop+0x70/0x70
  kernel:  __cv_timedwait_io+0x15/0x20 [spl]
  kernel:  zio_wait+0x129/0x2b0 [zfs]
  kernel:  dmu_buf_hold+0x5b/0x90 [zfs]
  kernel:  zap_lockdir+0x4e/0xb0 [zfs]
  kernel:  zap_cursor_retrieve+0x1ae/0x320 [zfs]
  kernel:  ? dbuf_prefetch+0xf/0x20 [zfs]
  kernel:  ? dmu_prefetch+0xc8/0x200 [zfs]
  kernel:  zfs_readdir+0x12a/0x440 [zfs]
  kernel:  ? preempt_count_add+0x68/0xa0
  kernel:  ? preempt_count_add+0x68/0xa0
  kernel:  ? aa_file_perm+0x120/0x4c0
  kernel:  ? rrw_exit+0x65/0x150 [zfs]
  kernel:  ? _copy_to_user+0x21/0x30
  kernel:  ? cp_new_stat+0x150/0x180
  kernel:  zpl_iterate+0x4c/0x70 [zfs]
  kernel:  iterate_dir+0x171/0x1c0
  kernel:  __x64_sys_getdents64+0x78/0x110
  kernel:  ? __ia32_sys_getdents64+0x110/0x110
  kernel:  do_syscall_64+0x38/0xc0
  kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
  kernel: RIP: 0033:0x7f03c897a9c7
  kernel: RSP: 002b:00007ffd41e3c518 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
  kernel: RAX: ffffffffffffffda RBX: 0000561eff64dd40 RCX: 00007f03c897a9c7
  kernel: RDX: 0000000000008000 RSI: 0000561eff64dd70 RDI: 0000000000000000
  kernel: RBP: 0000561eff64dd70 R08: 0000000000000030 R09: 00007f03c8a72be0
  kernel: R10: 0000000000020000 R11: 0000000000000293 R12: ffffffffffffff80
  kernel: R13: 0000561eff64dd44 R14: 0000000000000000 R15: 0000000000000001
  kernel:  </TASK>

(this is with ZFS on top of the NBD device).


As far as I can tell, the problem is that while an NBD request is
pending, the atsk that waits for the result (in this case *rsync*) is
refusing to freeze. This happens even when setting a 5 minute timeout
for freezing (which is more than enough time for the NBD request to
complete), so I suspect that the NBD server task (in this case nbdkit)
has already been frozen and is thus unable to make progress.

However, I do not understand why the same is not happening for FUSE
(with FUSE requests being stuck because the FUSE daemon is already
frozen). Was I just very lucky in my tests? Or are tasks waiting for
FUSE request in a different kind of state? Or is NBD a red-herring here,
and the real trouble is with ZFS?

It would be great if someone  could shed some light on what's going on.


Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«


Reply to: