[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug with soft raid?



On 12.02.2019 16:08, steve wrote:
Hi There,

Here is what I get on my up to date stretch box:

Feb 10 12:19:12 box kernel: [ 7734.562060] INFO: task md0_raid1:461 blocked for more than 120 seconds.
Feb 10 12:19:12 box kernel: [ 7734.562068]       Tainted: P           OE     4.19.0-0.bpo.1-amd64 #1 Debian 4.19.12-1~bpo9+1
Feb 10 12:19:12 box kernel: [ 7734.562070] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 10 12:19:12 box kernel: [ 7734.562074] md0_raid1       D    0   461      2 0x80000000
Feb 10 12:19:12 box kernel: [ 7734.562078] Call Trace:
Feb 10 12:19:12 box kernel: [ 7734.562093]  ? __schedule+0x3f5/0x880
Feb 10 12:19:12 box kernel: [ 7734.562098]  schedule+0x32/0x80
Feb 10 12:19:12 box kernel: [ 7734.562112]  md_super_wait+0x6e/0xa0 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562120]  ? remove_wait_queue+0x60/0x60
Feb 10 12:19:12 box kernel: [ 7734.562128]  md_update_sb.part.61+0x4af/0x910 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562136]  md_check_recovery+0x312/0x530 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562143]  raid1d+0x60/0x8c0 [raid1]
Feb 10 12:19:12 box kernel: [ 7734.562149]  ? schedule+0x32/0x80
Feb 10 12:19:12 box kernel: [ 7734.562152]  ? schedule_timeout+0x1e5/0x350
Feb 10 12:19:12 box kernel: [ 7734.562159]  ? md_thread+0x125/0x170 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562165]  md_thread+0x125/0x170 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562170]  ? remove_wait_queue+0x60/0x60
Feb 10 12:19:12 box kernel: [ 7734.562174]  kthread+0xf8/0x130
Feb 10 12:19:12 box kernel: [ 7734.562180]  ? md_rdev_init+0xc0/0xc0 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562183]  ? kthread_create_worker_on_cpu+0x70/0x70
Feb 10 12:19:12 box kernel: [ 7734.562186]  ret_from_fork+0x35/0x40
Feb 10 12:19:12 box kernel: [ 7734.562192] INFO: task md1_raid1:468 blocked for more than 120 seconds.
Feb 10 12:19:12 box kernel: [ 7734.562196]       Tainted: P           OE     4.19.0-0.bpo.1-amd64 #1 Debian 4.19.12-1~bpo9+1
Feb 10 12:19:12 box kernel: [ 7734.562198] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 10 12:19:12 box kernel: [ 7734.562200] md1_raid1       D    0   468      2 0x80000000
Feb 10 12:19:12 box kernel: [ 7734.562203] Call Trace:
Feb 10 12:19:12 box kernel: [ 7734.562209]  ? __schedule+0x3f5/0x880
Feb 10 12:19:12 box kernel: [ 7734.562213]  schedule+0x32/0x80
Feb 10 12:19:12 box kernel: [ 7734.562219]  md_super_wait+0x6e/0xa0 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562224]  ? remove_wait_queue+0x60/0x60
Feb 10 12:19:12 box kernel: [ 7734.562230]  md_update_sb.part.61+0x4af/0x910 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562237]  md_check_recovery+0x312/0x530 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562241]  raid1d+0x60/0x8c0 [raid1]
Feb 10 12:19:12 box kernel: [ 7734.562246]  ? schedule+0x32/0x80
Feb 10 12:19:12 box kernel: [ 7734.562248]  ? schedule_timeout+0x1e5/0x350
Feb 10 12:19:12 box kernel: [ 7734.562255]  ? md_thread+0x125/0x170 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562260]  md_thread+0x125/0x170 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562265]  ? remove_wait_queue+0x60/0x60
Feb 10 12:19:12 box kernel: [ 7734.562268]  kthread+0xf8/0x130
Feb 10 12:19:12 box kernel: [ 7734.562274]  ? md_rdev_init+0xc0/0xc0 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562276]  ? kthread_create_worker_on_cpu+0x70/0x70
Feb 10 12:19:12 box kernel: [ 7734.562279]  ret_from_fork+0x35/0x40
Feb 10 12:19:12 box kernel: [ 7734.562366] INFO: task fetchmail:3277 blocked for more than 120 seconds.
Feb 10 12:19:12 box kernel: [ 7734.562369]       Tainted: P           OE     4.19.0-0.bpo.1-amd64 #1 Debian 4.19.12-1~bpo9+1
Feb 10 12:19:12 box kernel: [ 7734.562371] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 10 12:19:12 box kernel: [ 7734.562373] fetchmail       D    0  3277      1 0x00000080
Feb 10 12:19:12 box kernel: [ 7734.562376] Call Trace:
Feb 10 12:19:12 box kernel: [ 7734.562381]  ? __schedule+0x3f5/0x880
Feb 10 12:19:12 box kernel: [ 7734.562412]  ? __ext4_handle_dirty_metadata+0x51/0x1a0 [ext4]
Feb 10 12:19:12 box kernel: [ 7734.562418]  schedule+0x32/0x80
Feb 10 12:19:12 box kernel: [ 7734.562426]  md_write_start+0x13c/0x210 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562432]  ? remove_wait_queue+0x60/0x60
Feb 10 12:19:12 box kernel: [ 7734.562437]  raid1_make_request+0x63/0xbc0 [raid1]
Feb 10 12:19:12 box kernel: [ 7734.562442]  ? mempool_alloc+0x69/0x190
Feb 10 12:19:12 box kernel: [ 7734.562446]  ? bvec_alloc+0x86/0xe0
Feb 10 12:19:12 box kernel: [ 7734.562453]  md_handle_request+0x116/0x190 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562460]  md_make_request+0x72/0x170 [md_mod]
Feb 10 12:19:12 box kernel: [ 7734.562465]  generic_make_request+0x1e7/0x410
Feb 10 12:19:12 box kernel: [ 7734.562471]  ? submit_bio+0x6c/0x140
Feb 10 12:19:12 box kernel: [ 7734.562474]  submit_bio+0x6c/0x140
Feb 10 12:19:12 box kernel: [ 7734.562500]  ext4_io_submit+0x49/0x60 [ext4]
Feb 10 12:19:12 box kernel: [ 7734.562519]  ext4_writepages+0x621/0xe00 [ext4]
Feb 10 12:19:12 box kernel: [ 7734.562526]  ? do_writepages+0x1a/0x60
Feb 10 12:19:12 box kernel: [ 7734.562529]  do_writepages+0x1a/0x60
Feb 10 12:19:12 box kernel: [ 7734.562533]  __filemap_fdatawrite_range+0xc8/0x100
Feb 10 12:19:12 box kernel: [ 7734.562552]  ext4_rename+0x676/0x900 [ext4]


and so one.

The system blocks for about 3 minutes and then I get back a hand on it.

I'm no specialist so I don't know what to do.
Any help would appreciated.

Best,
Steve

Seems like this behavior is explained in this article, including the solution. [1] Not a bug, but something that requires an optimization settings.
*But*, I'd first checked the disks for hardware problems and md array for consistency, after complete data backup is made and verified, of course, like David already suggested.

Depending on the drives they could be "slow" because of active APM putting them to sleep state and causing them to spin down and park heads, or they could be "slow" because of bad blocks are found and drive's firmware trying to reallocate them. Bottom line is, you have to ensure your hard drives are ok.
To investigate further, more information is required.
Get output of this command for each device:
    $ sudo smartctl --all /dev/sda
Also check current state of your md array:
    $ sudo mdadm --detail --scan --verbose
    $ cat /proc/mdstat

[1] https://globalroot.wordpress.com/2015/02/13/linux-error-message-task-blocked-for-more-than-120-seconds/

-- 
With kindest regards, Alexander.

⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀⠀⠀⠀ 

Reply to: