Bug#682233: mpt2sas: kernel crash under load with hanged disks

To: Jonathan Nieder <jrnieder@gmail.com>
Cc: 682233@bugs.debian.org
Subject: Bug#682233: mpt2sas: kernel crash under load with hanged disks
From: George Shuklin <george.shuklin@gmail.com>
Date: Mon, 03 Sep 2012 06:09:55 +0400
Message-id: <[🔎] 50441173.9020906@gmail.com>
Reply-to: George Shuklin <george.shuklin@gmail.com>, 682233@bugs.debian.org
In-reply-to: <[🔎] 20120903020130.GA2719@mannheim-rule.local>
References: <20120720140831.4531.85107.reportbug@cvt-xs5> <[🔎] 20120903020130.GA2719@mannheim-rule.local>

We've tested it with vanilla 3.2.12, problem was same.


On 03.09.2012 06:01, Jonathan Nieder wrote:

Hi George,

George Shuklin wrote:

Tags: upstream

Which upstream version did you test?

[...]

That bug found in 3.2 and 3.3 versions of kernel, but not
reproducing in 3.0.

[...]

1) Set up large raid10.
2) Start it rebuild
3) run addition io on raid (dd if=/dev/md0 of=/dev/md0)
4) Somehow make to slow down IO on two or more disks. We found that
bug in wild with normal load, but following scripts allows to see it
in few minutes:

[...]

end_request: I/O error, dev sdf, sector 729088
------------[ cut here ]------------
kernel BUG at [...]/linux-3.4.4/drivers/scsi/scsi_lib.c:1154!

[...]

Pid: 343, comm: kworker/5:1 Not tainted 3.4-trunk-amd64 #1 Supermicro X8DTN+-F/X8DTN+-F

[...]

Call Trace:
  [<ffffffffa00dbafa>] ? sd_prep_fn+0x2e9/0xb8e [sd_mod]
  [<ffffffff811ace28>] ? cfq_dispatch_requests+0x722/0x880
  [<ffffffff81196589>] ? create_io_context+0x5a/0x5a
  [<ffffffff811993dd>] ? blk_peek_request+0xcf/0x1ac

[...]

Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 48 48 85 c0 74 0c 48 89 ee 48 89 df ff d0 85 c0 75 44 66 83 bd e0 00 00 00 00 75 02<0f>  0b 48 89 ee 48 89 df e8 62 ec ff ff 48 85 c0 48 89 c2 74 20
RIP  [<ffffffffa0076104>] scsi_setup_fs_cmnd+0x45/0x83 [scsi_mod]

Thanks for a clear report, and sorry for the slow reply.

This is "BUG_ON(!req->nr_phys_segments)".  Smells similar to [1],
which bisected to v3.1-rc1~131^2~31 and was fixed by v3.2.2~91
(md/raid1: perform bad-block tests for WriteMostly devices too,
2012-01-09), aka v3.3-rc3~3^2~2.

But that wouldn't explain triggering the same trace in a 3.4.y kernel.

Is this reproducible with 3.5.2 or newer from experimental?  Which
3.2.y kernel did you use to experience it?

Curious,
Jonathan

[1] http://thread.gmane.org/gmane.linux.raid/36732

Reply to:

Follow-Ups:
- Bug#682233: mpt2sas: kernel crash under load with hanged disks
  - From: Jonathan Nieder <jrnieder@gmail.com>

References:
- Bug#682233: mpt2sas: kernel crash under load with hanged disks
  - From: Jonathan Nieder <jrnieder@gmail.com>

Prev by Date: Bug#682233: mpt2sas: kernel crash under load with hanged disks
Next by Date: Bug#649253: eeepc-wmi does not seem to adjust fan speed, but prevents use of fancontrol
Previous by thread: Bug#682233: mpt2sas: kernel crash under load with hanged disks
Next by thread: Bug#682233: mpt2sas: kernel crash under load with hanged disks
Index(es):
- Date
- Thread