aacraid: Host adapter reset request. SCSI hang ?

To: debian-kernel@lists.debian.org
Subject: aacraid: Host adapter reset request. SCSI hang ?
From: Camaleón <noelamac+gmane@gmail.com>
Date: Sat, 25 Oct 2014 19:35:45 +0000 (UTC)
Message-id: <[🔎] pan.2014.10.25.19.35.44@gmail.com>

Hello,

Not sure if this the right place for this, but before going to BTS I
better ask here for some advice.

Recently, (since October 8 and not before) some of my servers running an
up-to-date Wheezy with aacraid card (Adaptec 2020SA) are going nuts:

(...)
Oct 12 07:38:34 my_machine kernel: [3007914.062687] aacraid: Host adapter abort request (0,0,0,0)
Oct 12 07:38:34 my_machine kernel: [3007914.065137] aacraid: Host adapter reset request. SCSI hang ?
Oct 12 07:38:41 my_machine kernel: [3007920.532027] INFO: task kworker/2:1:37 blocked for more than 120 seconds.
Oct 12 07:38:41 my_machine kernel: [3007920.535101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 12 07:38:41 my_machine kernel: [3007920.538765] kworker/2:1     D ffff88022fd13780     0    37      2 0x00000000
Oct 12 07:38:41 my_machine kernel: [3007920.538803]  ffff8802258a3180 0000000000000046 0000000000000000 ffff880226cd0780
Oct 12 07:38:41 my_machine kernel: [3007920.538816]  0000000000013780 ffff88022591bfd8 ffff88022591bfd8 ffff8802258a3180
Oct 12 07:38:41 my_machine kernel: [3007920.538901]  0000000000000001 0000000100000400 0000000000000001 ffff8802259c7be8
Oct 12 07:38:41 my_machine kernel: [3007920.538987] Call Trace:
Oct 12 07:38:41 my_machine kernel: [3007920.539003]  [<ffffffff8134fa44>] ? __mutex_lock_common.isra.5+0xff/0x164
Oct 12 07:38:41 my_machine kernel: [3007920.539012]  [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf
Oct 12 07:38:41 my_machine kernel: [3007920.539021]  [<ffffffff8134f932>] ? mutex_lock+0x1a/0x2d
Oct 12 07:38:41 my_machine kernel: [3007920.539077]  [<ffffffffa0093c82>] ? reiserfs_mutex_lock_safe+0x19/0x24 [reiserfs]
Oct 12 07:38:41 my_machine kernel: [3007920.539096]  [<ffffffffa0095059>] ? flush_commit_list+0x11b/0x4fc [reiserfs]
Oct 12 07:38:41 my_machine kernel: [3007920.539105]  [<ffffffff8134f144>] ? _cond_resched+0x7/0x1c
Oct 12 07:38:41 my_machine kernel: [3007920.539123]  [<ffffffffa0095b44>] ? flush_async_commits+0x3b/0x46 [reiserfs]
Oct 12 07:38:41 my_machine kernel: [3007920.539134]  [<ffffffff8105b5f7>] ? process_one_work+0x161/0x269
Oct 12 07:38:41 my_machine kernel: [3007920.539142]  [<ffffffff8105c5c0>] ? worker_thread+0xc2/0x145
Oct 12 07:38:41 my_machine kernel: [3007920.539174]  [<ffffffff8105c4fe>] ? manage_workers.isra.25+0x15b/0x15b
Oct 12 07:38:41 my_machine kernel: [3007920.539183]  [<ffffffff8105f701>] ? kthread+0x76/0x7e
Oct 12 07:38:41 my_machine kernel: [3007920.539216]  [<ffffffff813575b4>] ? kernel_thread_helper+0x4/0x10
Oct 12 07:38:41 my_machine kernel: [3007920.539250]  [<ffffffff8105f68b>] ? kthread_worker_fn+0x139/0x139
Oct 12 07:38:41 my_machine kernel: [3007920.539282]  [<ffffffff813575b0>] ? gs_change+0x13/0x13

Systems do not really hang but any operations are carried out slowly
and after some minutes they recover their normal status. I recently
triggered this messages when simply issuing an update (apt-get update
&& apt-get -V dist-upgrade) but given this message can appear again
I'm not sure what would be the best next step.

Adaptec has a KB article¹ about this but looking at the suggested
value for "timeout" they are already set to 45 seconds but the
message comes out. Should I increase the timeout to something else
like 180 or so? Any tips would be appreciated.

¹http://ask.adaptec.com/app/answers/detail/a_id/15357/related/1

Greetings,

-- 
Camaleón

Reply to:

Prev by Date: Bug#766448: why one would want fstype auto for /
Next by Date: Processed: reassign 766793 src:linux
Previous by thread: Bug#766448: why one would want fstype auto for /
Next by thread: Processed: reassign 766793 src:linux
Index(es):
- Date
- Thread