[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

INFO: task blocked for more than 120 seconds



Bonjour à tous,

Depuis que j'ai changé de machine, j'ai des problèmes de freeze
intempestifs. Mais tout n'est pas gelé. Un 'ls' gèle alors que d'autres
processus fonctionne normalement. La souris n'est pas touchée ni le
clavier. Dans les logs, j'ai ça:

1 box kernel: [ 3988.692306] INFO: task md1_raid1:406 blocked for more than 120 seconds. 1 box kernel: [ 3988.692314] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.692316] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.692320] md1_raid1 D 0 406 2 0x80000000 1 box kernel: [ 3988.692324] Call Trace: 1 box kernel: [ 3988.692337] ? __schedule+0x3f5/0x880 1 box kernel: [ 3988.692342] schedule+0x32/0x80 1 box kernel: [ 3988.692356] md_super_wait+0x6e/0xa0 [md_mod] 1 box kernel: [ 3988.692365] ? remove_wait_queue+0x60/0x60 1 box kernel: [ 3988.692373] md_update_sb.part.61+0x4af/0x910 [md_mod] 1 box kernel: [ 3988.692381] md_check_recovery+0x312/0x530 [md_mod] 1 box kernel: [ 3988.692388] raid1d+0x60/0x8c0 [raid1] 1 box kernel: [ 3988.692394] ? schedule+0x32/0x80 1 box kernel: [ 3988.692398] ? schedule_timeout+0x1e5/0x350 1 box kernel: [ 3988.692405] ? md_thread+0x125/0x170 [md_mod] 1 box kernel: [ 3988.692411] md_thread+0x125/0x170 [md_mod] 1 box kernel: [ 3988.692416] ? remove_wait_queue+0x60/0x60 1 box kernel: [ 3988.692420] kthread+0xf8/0x130 1 box kernel: [ 3988.692427] ? md_rdev_init+0xc0/0xc0 [md_mod] 1 box kernel: [ 3988.692430] ? kthread_create_worker_on_cpu+0x70/0x70 1 box kernel: [ 3988.692433] ret_from_fork+0x35/0x40 1 box kernel: [ 3988.692438] INFO: task md0_raid1:411 blocked for more than 120 seconds. 1 box kernel: [ 3988.692441] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.692443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.692446] md0_raid1 D 0 411 2 0x80000000 [...] 1 box kernel: [ 3988.692539] INFO: task jbd2/md0-8:985 blocked for more than 120 seconds. 1 box kernel: [ 3988.692542] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.692544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.692546] jbd2/md0-8 D 0 985 2 0x80000000 1 box kernel: [ 3988.692549] Call Trace: [...] 1 box kernel: [ 3988.692730] INFO: task jbd2/md1-8:994 blocked for more than 120 seconds. 1 box kernel: [ 3988.692733] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.692735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.692737] jbd2/md1-8 D 0 994 2 0x80000000 [...] 1 box kernel: [ 3988.692896] INFO: task uptimed:1161 blocked for more than 120 seconds. 1 box kernel: [ 3988.692899] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.692901] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.692904] uptimed D 0 1161 1 0x00000080 1 box kernel: [ 3988.692906] Call Trace: [...] 1 box kernel: [ 3988.693069] RIP: 0033:0x7fdf53aaa6f0 1 box kernel: [ 3988.693076] Code: Bad RIP value. 1 box kernel: [ 3988.693078] RSP: 002b:00007ffece358a28 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 1 box kernel: [ 3988.693082] RAX: ffffffffffffffda RBX: 0000564ce8e167b0 RCX: 00007fdf53aaa6f0 1 box kernel: [ 3988.693083] RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 00007fdf53d702b0 1 box kernel: [ 3988.693085] RBP: 0000000000000004 R08: 0000000000000004 R09: 0000000000000001 1 box kernel: [ 3988.693087] R10: 0000000000000240 R11: 0000000000000246 R12: 00007fdf53d7042d 1 box kernel: [ 3988.693088] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 1 box kernel: [ 3988.693119] INFO: task fetchmail:3244 blocked for more than 120 seconds. 1 box kernel: [ 3988.693122] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.693124] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.693127] fetchmail D 0 3244 1 0x00000080 1 box kernel: [ 3988.693129] Call Trace: 1 box kernel: [ 3988.693331] RIP: 0033:0x7ff77cd21970 [...] 1 box kernel: [ 3988.693335] Code: Bad RIP value. 1 box kernel: [ 3988.693336] RSP: 002b:00007ffd2b5b26f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 1 box kernel: [ 3988.693339] RAX: ffffffffffffffda RBX: 0000000000000050 RCX: 00007ff77cd21970 1 box kernel: [ 3988.693341] RDX: 0000000000000050 RSI: 000055b1c5456900 RDI: 0000000000000001 1 box kernel: [ 3988.693343] RBP: 000055b1c5456900 R08: 00007ff77cfe1760 R09: 00007ff77e682740 1 box kernel: [ 3988.693344] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000050 1 box kernel: [ 3988.693346] R13: 0000000000000001 R14: 00007ff77cfe0600 R15: 0000000000000050 1 box kernel: [ 3988.693362] INFO: task kworker/u56:0:7704 blocked for more than 120 seconds. 1 box kernel: [ 3988.693365] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.693367] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.693370] kworker/u56:0 D 0 7704 2 0x80000080 1 box kernel: [ 3988.693378] Workqueue: writeback wb_workfn (flush-9:0) [...] 1 box kernel: [ 3988.693635] INFO: task kworker/u56:2:10260 blocked for more than 120 seconds. 1 box kernel: [ 3988.693639] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.693640] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.693643] kworker/u56:2 D 0 10260 2 0x80000080 1 box kernel: [ 3988.693650] Workqueue: writeback wb_workfn (flush-9:1) 1 box kernel: [ 3988.693804] INFO: task lpqd:10309 blocked for more than 120 seconds. [...] 1 box kernel: [ 3988.693806] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 3988.693808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 3988.693811] lpqd D 0 10309 2682 0x00000080 1 box kernel: [ 3988.693813] Call Trace: [...] 1 box kernel: [ 3988.693949] RIP: 0033:0x7f98a07789e7 1 box kernel: [ 3988.693953] Code: Bad RIP value. 1 box kernel: [ 3988.693954] RSP: 002b:00007ffcce9c2e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000031 1 box kernel: [ 3988.693957] RAX: ffffffffffffffda RBX: 0000564564def780 RCX: 00007f98a07789e7 1 box kernel: [ 3988.693959] RDX: 000000000000006e RSI: 00007ffcce9c2f40 RDI: 0000000000000007 1 box kernel: [ 3988.693960] RBP: 0000564564e17360 R08: 00007f98a0a28f78 R09: 0000000000000410 1 box kernel: [ 3988.693962] R10: 00000000000002f0 R11: 0000000000000202 R12: 0000000000000007 1 box kernel: [ 3988.693963] R13: 00007ffcce9c2f40 R14: 0000564564e177f8 R15: 0000564564e188b0 1 box kernel: [ 4109.529828] INFO: task systemd:1 blocked for more than 120 seconds. 1 box kernel: [ 4109.529836] Tainted: P OE 4.19.0-0.bpo.2-amd64 #1 Debian 4.19.16-1~bpo9+1 1 box kernel: [ 4109.529838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 1 box kernel: [ 4109.529841] systemd D 0 1 0 0x00000000 1 box kernel: [ 4109.529846] Call Trace: [...] 1 box kernel: [ 4109.530016] RIP: 0033:0x7fca74ec2687 1 box kernel: [ 4109.530023] Code: Bad RIP value. 1 box kernel: [ 4109.530025] RSP: 002b:00007ffc280fb378 EFLAGS: 00000246 ORIG_RAX: 0000000000000053 1 box kernel: [ 4109.530029] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fca74ec2687 1 box kernel: [ 4109.530030] RDX: 00000000000002ee RSI: 00000000000001c0 RDI: 000055d88ed7d220 1 box kernel: [ 4109.530032] RBP: 000000000003a2f8 R08: 0000000000000000 R09: 0000000000000070 1 box kernel: [ 4109.530034] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d88ed7d272 1 box kernel: [ 4109.530036] R13: 8421084210842109 R14: 00000000000000c2 R15: 00007fca74f50540
J'ai supprimé les détails des call trace ([...]) afin de ne pas faire trop long.

On voit que plusieurs processus bloquent (md1_raid1, md0_raid1, uptimed,
fetchmail, kworker, lpqd et systemd). Je pensais à un disque défectueux
dans une grappe RAID 1, alors je l'ai enlevé, ce qui a eu pour effet de
d'augmenter la durée entre deux freeze. J'ai également essayé avec des
versions de noyaux inférieurs, mais même résultat. J'ai lu quelque part
sur Internet que cela pouvait être dû à une machine lente, mais c'est
loin d'être mon cas.

Je suis complètement à court d'idée.

Et vous ?

Merci d'avance,
Steve


Reply to: