[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Nbd] oops with timeout option on debian wheezy kernel



Dear nbd kernel maintainers,

according to a mail from 21 July 2011 (http://comments.gmane.org/gmane.linux.drivers.nbd.general/985)
this is due to the ioctl(nbd, NBD_SET_TIMEOUT, timeout) not working.

Here on debian wheezy (Linux www1 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1+deb7u1 x86_64 GNU/Linux) a disconnection of the server host yields hanging
of any "cat /proc/mdstat" on the raid1 on top of the nbd device
and to the appended oops (only two shown) of the kernel. Only
bringing back the nbd server allows further commands on the md device.

Any news/ideas on that subject ?
Thanks,
 greetings
  Hermann

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@...1489...
Feb  5 15:00:07 www1 kernel: [13830012.087036] RAID1 conf printout:
Feb  5 15:00:07 www1 kernel: [13830012.087040]  --- wd:1 rd:2
Feb  5 15:00:07 www1 kernel: [13830012.087043]  disk 0, wo:0, o:1, dev:sdb
Feb  5 15:00:07 www1 kernel: [13830012.087046]  disk 1, wo:1, o:1, dev:nbd2
Feb  5 15:00:07 www1 kernel: [13830012.087254] md: recovery of RAID array md2
Feb  5 15:00:07 www1 kernel: [13830012.107754] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Feb  5 15:00:07 www1 kernel: [13830012.137583] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Feb  5 15:00:07 www1 kernel: [13830012.185260] md: using 128k window, over a total of 292935844k.
Feb  5 15:48:38 www1 kernel: [13832918.862727] md: md2: recovery done.
Feb  5 15:48:38 www1 kernel: [13832918.899862] RAID1 conf printout:
Feb  5 15:48:38 www1 kernel: [13832918.899866]  --- wd:2 rd:2
Feb  5 15:48:38 www1 kernel: [13832918.899869]  disk 0, wo:0, o:1, dev:sdb
Feb  5 15:48:38 www1 kernel: [13832918.899872]  disk 1, wo:0, o:1, dev:nbd2
Feb  5 16:20:53 www1 kernel: [13834851.077695] INFO: task md2_raid1:392 blocked for more than 120 seconds.
Feb  5 16:20:53 www1 kernel: [13834851.114065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  5 16:20:53 www1 kernel: [13834851.159961] md2_raid1       D ffff88061fc53780     0   392      2 0x00000000
Feb  5 16:20:53 www1 kernel: [13834851.188043]  ffff880310326e60 0000000000000046 0000000000000000 ffff880313331650
Feb  5 16:20:53 www1 kernel: [13834851.229957]  0000000000013780 ffff8803113bbfd8 ffff8803113bbfd8 ffff880310326e60
Feb  5 16:20:53 www1 kernel: [13834851.243764]  0000000000000246 000000018134eb89 ffff8806113c3e80 ffff8806113c3c00
Feb  5 16:20:53 www1 kernel: [13834851.270772] Call Trace:
Feb  5 16:20:54 www1 kernel: [13834851.286010]  [<ffffffffa010c6ed>] ? md_super_wait+0x6a/0x80 [md_mod]
Feb  5 16:20:54 www1 kernel: [13834851.306730]  [<ffffffff8105fadf>] ? add_wait_queue+0x3c/0x3c
Feb  5 16:20:54 www1 kernel: [13834851.328712]  [<ffffffffa010ca85>] ? md_update_sb+0x382/0x474 [md_mod]
Feb  5 16:20:54 www1 kernel: [13834851.356998]  [<ffffffffa010d2f1>] ? md_check_recovery+0x218/0x514 [md_mod]
Feb  5 16:20:54 www1 kernel: [13834851.389643]  [<ffffffffa0036446>] ? raid1d+0x3d/0xbb7 [raid1]
Feb  5 16:20:54 www1 kernel: [13834851.420594]  [<ffffffff81039982>] ? finish_task_switch+0x88/0xb9
Feb  5 16:20:54 www1 kernel: [13834851.454552]  [<ffffffff8134d811>] ? __schedule+0x5f9/0x610
Feb  5 16:20:54 www1 kernel: [13834851.479579]  [<ffffffff8134dcdb>] ? schedule_timeout+0x2c/0xdb
Feb  5 16:20:54 www1 kernel: [13834851.512461]  [<ffffffff81070e05>] ? arch_local_irq_save+0x11/0x17
Feb  5 16:20:54 www1 kernel: [13834851.532763]  [<ffffffffa0107253>] ? md_thread+0x114/0x132 [md_mod]
Feb  5 16:20:54 www1 kernel: [13834851.544613]  [<ffffffff8105fadf>] ? add_wait_queue+0x3c/0x3c
Feb  5 16:20:54 www1 kernel: [13834851.570418]  [<ffffffffa010713f>] ? md_rdev_init+0xea/0xea [md_mod]
Feb  5 16:20:54 www1 kernel: [13834851.590348]  [<ffffffff8105f48d>] ? kthread+0x76/0x7e
Feb  5 16:20:54 www1 kernel: [13834851.608264]  [<ffffffff81355cf4>] ? kernel_thread_helper+0x4/0x10
Feb  5 16:20:54 www1 kernel: [13834851.642318]  [<ffffffff8105f417>] ? kthread_worker_fn+0x139/0x139
Feb  5 16:20:54 www1 kernel: [13834851.671223]  [<ffffffff81355cf0>] ? gs_change+0x13/0x13
Feb  5 16:20:54 www1 kernel: [13834851.699173] INFO: task kjournald:1813 blocked for more than 120 seconds.
Feb  5 16:20:54 www1 kernel: [13834851.728096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  5 16:20:54 www1 kernel: [13834851.761423] kjournald       D ffff88031fc13780     0  1813      2 0x00000000
Feb  5 16:20:54 www1 kernel: [13834851.792189]  ffff8803109f6f20 0000000000000046 0000000000000000 ffffffff8160d020
Feb  5 16:20:54 www1 kernel: [13834851.818791]  0000000000013780 ffff880310135fd8 ffff880310135fd8 ffff8803109f6f20
Feb  5 16:20:54 www1 kernel: [13834851.840895]  0000000000000246 000000018134eb89 ffff8806113c3e80 ffff8806113c3c00
Feb  5 16:20:54 www1 kernel: [13834851.860978] Call Trace:
Feb  5 16:20:54 www1 kernel: [13834851.862575]  [<ffffffffa010b11f>] ? md_write_start+0x133/0x149 [md_mod]
Feb  5 16:20:54 www1 kernel: [13834851.874905]  [<ffffffff8105fadf>] ? add_wait_queue+0x3c/0x3c
Feb  5 16:20:54 www1 kernel: [13834851.893506]  [<ffffffffa0037898>] ? make_request+0x3f/0xa69 [raid1]
Feb  5 16:20:54 www1 kernel: [13834851.923856]  [<ffffffff810b4683>] ? find_get_page+0x40/0x62
Feb  5 16:20:54 www1 kernel: [13834851.951893]  [<ffffffff810be11d>] ? put_page+0x18/0x27
Feb  5 16:20:54 www1 kernel: [13834851.969960]  [<ffffffffa0106d44>] ? md_make_request+0xee/0x1db [md_mod]
Feb  5 16:20:54 www1 kernel: [13834851.983105]  [<ffffffff81198c3e>] ? generic_make_request+0x90/0xcf
Feb  5 16:20:54 www1 kernel: [13834852.003686]  [<ffffffff81198d50>] ? submit_bio+0xd3/0xf1
Feb  5 16:20:54 www1 kernel: [13834852.025657]  [<ffffffff81120a1e>] ? bio_alloc_bioset+0x43/0xb6
Feb  5 16:20:54 www1 kernel: [13834852.053623]  [<ffffffff8111c99c>] ? submit_bh+0xe2/0xff
Feb  5 16:20:54 www1 kernel: [13834852.076585]  [<ffffffffa012e4d6>] ? journal_commit_transaction+0x87c/0xdce [jbd]
Feb  5 16:20:54 www1 kernel: [13834852.097533]  [<ffffffff81070e05>] ? arch_local_irq_save+0x11/0x17
Feb  5 16:20:54 www1 kernel: [13834852.133488]  [<ffffffff8134eb89>] ? _raw_spin_lock_irqsave+0x9/0x25
Feb  5 16:20:54 www1 kernel: [13834852.159446]  [<ffffffff8134ebc7>] ? _raw_spin_unlock_irqrestore+0xe/0xf
Feb  5 16:20:54 www1 kernel: [13834852.197454]  [<ffffffffa0131643>] ? kjournald+0xe0/0x21e [jbd]
Feb  5 16:20:54 www1 kernel: [13834852.223364]  [<ffffffff8105fadf>] ? add_wait_queue+0x3c/0x3c
Feb  5 16:20:54 www1 kernel: [13834852.246347]  [<ffffffffa0131563>] ? commit_timeout+0x5/0x5 [jbd]
Feb  5 16:20:54 www1 kernel: [13834852.266605]  [<ffffffff8105f48d>] ? kthread+0x76/0x7e
Feb  5 16:20:54 www1 kernel: [13834852.279943]  [<ffffffff81355cf4>] ? kernel_thread_helper+0x4/0x10
Feb  5 16:20:55 www1 kernel: [13834852.310534]  [<ffffffff8105f417>] ? kthread_worker_fn+0x139/0x139
Feb  5 16:20:55 www1 kernel: [13834852.325018]  [<ffffffff81355cf0>] ? gs_change+0x13/0x13
Feb  5 16:22:55 www1 kernel: [13834972.160196] INFO: task md2_raid1:392 blocked for more than 120 seconds.
Feb  5 16:22:55 www1 kernel: [13834972.192500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
Feb  5 16:34:33 www1 kernel: [13835669.383012] block nbd2: NBD_DISCONNECT
Feb  5 16:35:12 www1 kernel: [13835708.767415] block nbd2: Receive control failed (result -32)
Feb  5 16:35:12 www1 kernel: [13835708.792213] block nbd2: queue cleared

Reply to: