[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#741239: marked as done (task kjournald blocked for more than 120 seconds (very slow I/O, reboot required))



Your message dated Sat, 24 Apr 2021 14:34:34 -0700 (PDT)
with message-id <60848eea.1c69fb81.feac4.a4ed@mx.google.com>
and subject line Closing this bug (BTS maintenance for src:linux bugs)
has caused the Debian Bug report #741239,
regarding task kjournald blocked for more than 120 seconds (very slow I/O, reboot required)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
741239: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=741239
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-3.2.0-4-amd64
Version: 3.2.51-1

I had an image file mounted loopback. A few minutes after running rsync on
the mountpoint some bug was triggered and all I/O became extremely slow
after this:

Mar  9 11:21:20 st1 kernel: [1702551.406424] INFO: task kjournald:8705
blocked for more than 120 seconds.
Mar  9 11:21:20 st1 kernel: [1702551.406428] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  9 11:21:20 st1 kernel: [1702551.406431] kjournald       D
ffff88041fb93780     0  8705      2 0x00000000
Mar  9 11:21:20 st1 kernel: [1702551.406436]  ffff880403ff2fe0
0000000000000046 ffffffff00000000 ffff88040f7a4f60
Mar  9 11:21:20 st1 kernel: [1702551.406442]  0000000000013780
ffff880406069fd8 ffff880406069fd8 ffff880403ff2fe0
Mar  9 11:21:20 st1 kernel: [1702551.406446]  ffffffff810135d2
0000000181066245 ffff8804048144b0 ffff88041fb93fd0
Mar  9 11:21:20 st1 kernel: [1702551.406451] Call Trace:
Mar  9 11:21:20 st1 kernel: [1702551.406459]  [<ffffffff810135d2>] ?
read_tsc+0x5/0x14
Mar  9 11:21:20 st1 kernel: [1702551.406464]  [<ffffffff8111d315>] ?
wait_on_buffer+0x28/0x28
Mar  9 11:21:20 st1 kernel: [1702551.406469]  [<ffffffff8134e141>] ?
io_schedule+0x59/0x71
Mar  9 11:21:20 st1 kernel: [1702551.406473]  [<ffffffff8111d31b>] ?
sleep_on_buffer+0x6/0xa
Mar  9 11:21:20 st1 kernel: [1702551.406476]  [<ffffffff8134e584>] ?
__wait_on_bit+0x3e/0x71
Mar  9 11:21:20 st1 kernel: [1702551.406480]  [<ffffffff8134e626>] ?
out_of_line_wait_on_bit+0x6f/0x78
Mar  9 11:21:20 st1 kernel: [1702551.406483]  [<ffffffff8111d315>] ?
wait_on_buffer+0x28/0x28
Mar  9 11:21:20 st1 kernel: [1702551.406489]  [<ffffffff8105fcad>] ?
autoremove_wake_function+0x2a/0x2a
Mar  9 11:21:20 st1 kernel: [1702551.406497]  [<ffffffffa05b156c>] ?
journal_commit_transaction+0x912/0xdce [jbd]
Mar  9 11:21:20 st1 kernel: [1702551.406502]  [<ffffffff81070fc1>] ?
arch_local_irq_save+0x11/0x17
Mar  9 11:21:20 st1 kernel: [1702551.406506]  [<ffffffff8134f209>] ?
_raw_spin_lock_irqsave+0x9/0x25
Mar  9 11:21:20 st1 kernel: [1702551.406510]  [<ffffffff8134f247>] ?
_raw_spin_unlock_irqrestore+0xe/0xf
Mar  9 11:21:20 st1 kernel: [1702551.406516]  [<ffffffffa05b4643>] ?
kjournald+0xe0/0x21e [jbd]
Mar  9 11:21:20 st1 kernel: [1702551.406521]  [<ffffffff8105fc83>] ?
add_wait_queue+0x3c/0x3c
Mar  9 11:21:20 st1 kernel: [1702551.406526]  [<ffffffffa05b4563>] ?
commit_timeout+0x5/0x5 [jbd]
Mar  9 11:21:20 st1 kernel: [1702551.406530]  [<ffffffff8105f631>] ?
kthread+0x76/0x7e
Mar  9 11:21:20 st1 kernel: [1702551.406535]  [<ffffffff81356374>] ?
kernel_thread_helper+0x4/0x10
Mar  9 11:21:20 st1 kernel: [1702551.406540]  [<ffffffff8105f5bb>] ?
kthread_worker_fn+0x139/0x139
Mar  9 11:21:20 st1 kernel: [1702551.406544]  [<ffffffff81356370>] ?
gs_change+0x13/0x13
Mar  9 11:21:20 st1 kernel: [1702551.406548] INFO: task rsync:4668 blocked
for more than 120 seconds.
Mar  9 11:21:20 st1 kernel: [1702551.406550] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  9 11:21:20 st1 kernel: [1702551.406552] rsync           D
ffff88041fb13780     0  4668   8750 0x00000004
Mar  9 11:21:20 st1 kernel: [1702551.406557]  ffff880404a9e340
0000000000000086 ffff880100000000 ffff88040f775610
Mar  9 11:21:20 st1 kernel: [1702551.406561]  0000000000013780
ffff880406293fd8 ffff880406293fd8 ffff880404a9e340
Mar  9 11:21:20 st1 kernel: [1702551.406566]  0000000000000246
000000018134f209 ffff88041fdd5c18 ffff8803c84a7978
Mar  9 11:21:20 st1 kernel: [1702551.406570] Call Trace:
Mar  9 11:21:20 st1 kernel: [1702551.406575]  [<ffffffffa05b0513>] ?
do_get_write_access+0x1ad/0x367 [jbd]
Mar  9 11:21:20 st1 kernel: [1702551.406580]  [<ffffffff8105fcad>] ?
autoremove_wake_function+0x2a/0x2a
Mar  9 11:21:20 st1 kernel: [1702551.406589]  [<ffffffffa05df078>] ?
ext3_set_acl+0x87/0x1fc [ext3]
Mar  9 11:21:20 st1 kernel: [1702551.406594]  [<ffffffffa05b07e5>] ?
journal_get_write_access+0x21/0x38 [jbd]
Mar  9 11:21:20 st1 kernel: [1702551.406600]  [<ffffffffa05cac8f>] ?
__ext3_get_inode_loc+0xf4/0x2cf [ext3]
Mar  9 11:21:20 st1 kernel: [1702551.406607]  [<ffffffffa05dd459>] ?
__ext3_journal_get_write_access+0x1c/0x49 [ext3]
Mar  9 11:21:20 st1 kernel: [1702551.406614]  [<ffffffffa05df078>] ?
ext3_set_acl+0x87/0x1fc [ext3]
Mar  9 11:21:20 st1 kernel: [1702551.406620]  [<ffffffffa05cc2ed>] ?
ext3_reserve_inode_write+0x39/0x7b [ext3]
Mar  9 11:21:20 st1 kernel: [1702551.406625]  [<ffffffffa05af372>] ?
start_this_handle+0x2b8/0x31c [jbd]
Mar  9 11:21:20 st1 kernel: [1702551.406631]  [<ffffffffa05cc389>] ?
ext3_mark_inode_dirty+0x5a/0x79 [ext3]
Mar  9 11:21:20 st1 kernel: [1702551.406638]  [<ffffffffa05df078>] ?
ext3_set_acl+0x87/0x1fc [ext3]
Mar  9 11:21:20 st1 kernel: [1702551.406645]  [<ffffffffa05df2c8>] ?
ext3_xattr_set_acl+0xdb/0x11d [ext3]
Mar  9 11:21:20 st1 kernel: [1702551.406650]  [<ffffffff811130ce>] ?
generic_setxattr+0x69/0x6f
Mar  9 11:21:20 st1 kernel: [1702551.406654]  [<ffffffff8111375d>] ?
__vfs_setxattr_noperm+0x66/0xcd
Mar  9 11:21:20 st1 kernel: [1702551.406657]  [<ffffffff811138e5>] ?
setxattr+0x95/0xf0
Mar  9 11:21:20 st1 kernel: [1702551.406662]  [<ffffffff81036628>] ?
should_resched+0x5/0x23
Mar  9 11:21:20 st1 kernel: [1702551.406666]  [<ffffffff81113831>] ?
vfs_setxattr+0x6d/0x8c
Mar  9 11:21:20 st1 kernel: [1702551.406669]  [<ffffffff81036628>] ?
should_resched+0x5/0x23
Mar  9 11:21:20 st1 kernel: [1702551.406672]  [<ffffffff8111390d>] ?
setxattr+0xbd/0xf0
Mar  9 11:21:20 st1 kernel: [1702551.406678]  [<ffffffff810eb124>] ?
kmem_cache_free+0x2d/0x69
Mar  9 11:21:20 st1 kernel: [1702551.406683]  [<ffffffff81105d33>] ?
user_path_at_empty+0x53/0x7b
Mar  9 11:21:20 st1 kernel: [1702551.406688]  [<ffffffff81113aed>] ?
sys_setxattr+0x6a/0x8f
Mar  9 11:21:20 st1 kernel: [1702551.406692]  [<ffffffff81354212>] ?
system_call_fastpath+0x16/0x1b

The process 'rsync' became unkillable. These kernel messages continued for
the next 10 minutes or so with some variations.
Even after the messages stopped, I/O was still extremely slow... freezing up
all the time. With slow/freezing I/O I don't mean just to this mountpoint
(which actually seemed totally frozen) but also to/from the other disks.
This box is acting as a NFS server. Remote systems saw a continuing 'nfs:
server .. not respondig'.. followed by 'nfs: .. OK' a while later, and then
not responding again.

The exact situation was:
(rsync on..) ext3 -> loopback/DM -> ext4 -> DRBD -> RAID -> disks

I used kpartx to create loopback/DM devices based on an .img file which was
mounted in /data (ext4 on DRBD), then mounted /dev/mapper/loop1p1 on /mnt
(ext3), and ran an rsync on this /mnt.
All data was previously synchronized without ACL/EA's and I re-ran rsync
with -AX added. This means the rsync job was mostly synchronizing ACL's and
Extended Attributes, maybe that is a significant clue.

The machine has 16GB of RAM, of which typically 15G is available for
caching/buffers. At the time of the hang this was also the case, I checked
with 'free'.

Some possibly relevant vm sysctl settings:
vm.dirty_background_ratio = 10
vm.dirty_background_bytes = 0
vm.dirty_ratio = 0
vm.dirty_bytes = 15000000
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
As you can see dirty bytes is very low (just 15M), so that rules out
kjournald blocking due to a very large dirty buffer.

Hopefully this report will help. Of course, if this is a known issue and
there's a solution available then I'd be eager to hear about it.

If you need any additional details or if I misfiled this bug, let me know.

ii  linux-image-3.2.0-4-amd64          3.2.51-1                      amd64
      Linux 3.2 for 64-bit PCs

--- End Message ---
--- Begin Message ---
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

--- End Message ---

Reply to: