[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#392986: marked as done (linux-image-2.6.16-2-em64t-p4-smp: megaraid_sas issues warnings and RESETs on Perc 5i in Dell PE 2950)



Your message dated Mon, 27 Aug 2007 00:21:51 +0200
with message-id <20070826222150.GM907@baikonur.stro.at>
and subject line Bug#392986: linux-image-2.6.16-2-em64t-p4-smp: megaraid_sas issues warnings and RESETs
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---
Package: linux-image-2.6.16-2-em64t-p4-smp
Version: 2.6.16-18~bpo.1
Severity: normal

  I feel very bad filing a bug against a backports package, but this
backports package is (according to the changelog) an unmodified
2.6.16-18 package, just recompiled for sarge, and people on the
mailing lists are reporting problems with a variety of OSes [1], so I
have a feeling it's a genuine driver bug with this kernel version.
  In any case, the problem is that under heavy write load, I get
messages like these in /var/log/kern.log:

Oct  2 14:36:01 localhost kernel: sd 0:2:1:0: megasas: RESET -55455 cmd=2a
Oct  2 14:36:01 localhost kernel: megasas: reset successful
Oct  2 14:36:31 localhost kernel: sd 0:2:1:0: megasas: RESET -70369 cmd=2a
Oct  2 14:36:31 localhost kernel: megasas: reset successful
Oct  2 14:37:02 localhost kernel: sd 0:2:1:0: megasas: RESET -83487 cmd=2a
Oct  2 14:37:02 localhost kernel: megasas: reset successful
Oct  2 14:37:32 localhost kernel: sd 0:2:1:0: megasas: RESET -95079 cmd=2a
Oct  2 14:37:32 localhost kernel: megasas: reset successful
Oct  2 14:38:02 localhost kernel: sd 0:2:1:0: megasas: RESET -105361 cmd=2a
Oct  2 14:38:02 localhost kernel: megasas: reset successful
Oct  2 14:38:33 localhost kernel: sd 0:2:1:0: megasas: RESET -115613 cmd=2a
Oct  2 14:38:33 localhost kernel: megasas: reset successful
Oct  2 14:38:33 localhost kernel: sd 0:2:1:0: SCSI error: return code = 0x6000000
Oct  2 14:38:33 localhost kernel: end_request: I/O error, dev sdb, sector 2927091007
Oct  2 14:38:33 localhost kernel: Buffer I/O error on device sdb1, logical block 731772736
Oct  2 14:38:33 localhost kernel: lost page write due to I/O error on sdb1
Oct  2 14:39:03 localhost kernel: sd 0:2:1:0: megasas: RESET -125667 cmd=2a
Oct  2 14:39:03 localhost kernel: megasas: reset successful
Oct  2 14:39:33 localhost kernel: sd 0:2:1:0: megasas: RESET -135588 cmd=2a
Oct  2 14:39:33 localhost kernel: megasas: [ 0]waiting for 1 commands to complete
Oct  2 14:39:34 localhost kernel: megasas: reset successful

  A mailing list posting recommended reducing BLKDEV_MAX_RQ to 8 in
include/linux/blkdev.h as a workaround; I've tried that, and it seems
to work for me.  I suspect that the following patch is the actual fix
(from recent changes to drivers/scsi/megaraid/megaraid_sas.c):

--- a/drivers/scsi/megaraid/megaraid_sas.c 2006-03-20 00:53:29.000000000 -0500
+++ b/drivers/scsi/megaraid/megaraid_sas.c	2006-10-13 12:25:04.000000000 -0400
@@ -1716,6 +1823,12 @@
 	 * Get various operational parameters from status register
 	 */
 	instance->max_fw_cmds = instance->instancet->read_fw_status_reg(reg_set) & 0x00FFFF;
+	/*
+	 * Reduce the max supported cmds by 1. This is to ensure that the
+	 * reply_q_sz (1 more than the max cmd that driver may send)
+	 * does not exceed max cmds that the FW can support
+	 */
+	instance->max_fw_cmds = instance->max_fw_cmds-1;
 	instance->max_num_sge = (instance->instancet->read_fw_status_reg(reg_set) & 0xFF0000) >> 0x10;
 	/*

  ... but, of course, I'm not entirely sure what I'm doing.  This is a
production server now, but I may be able to do some amount of testing
(like installing an etch or unstable partition to test more recent
Debian kernels) from time to time over weekends or during downtime.  If
you'd like me to, let me know and I'll see what I can do.
  I'm also planning to test the above patch after consulting with some
kernel hackers.  I'll let you know how it goes.

[1] http://lists.us.dell.com/pipermail/linux-poweredge/2006-October/027705.html
    http://lkml.org/lkml/2006/9/6/12
    http://lists.us.dell.com/pipermail/linux-poweredge/2006-August/026821.html

-- System Information:
Debian Release: 3.1
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.16+max-nr-req-8
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages linux-image-2.6.16-2-em64t-p4-smp depends on:
ii  e2fsprogs                  1.37-2sarge1  ext2 file system utilities and lib
ii  initramfs-tools [linux-ini 0.80~bpo.1    tools for generating an initramfs
ii  module-init-tools          3.2.2-3~bpo.1 tools for managing Linux kernel mo

-- debconf information:
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/abort-install-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/preinst/bootloader-initrd-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/initrd-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/old-dir-initrd-link-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/postinst/old-initrd-link-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/already-running-this-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/bootloader-test-error-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/depmod-error-initrd-2.6.16-2-em64t-p4-smp: false
  linux-image-2.6.16-2-em64t-p4-smp/postinst/kimage-is-a-directory:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/old-system-map-link-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/prerm/would-invalidate-boot-loader-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/failed-to-move-modules-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/bootloader-error-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/depmod-error-2.6.16-2-em64t-p4-smp: false
  linux-image-2.6.16-2-em64t-p4-smp/preinst/lilo-initrd-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/elilo-initrd-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/overwriting-modules-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/abort-overwrite-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/create-kimage-link-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/prerm/removing-running-kernel-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/lilo-has-ramdisk:


--- End Message ---
--- Begin Message ---
On Sun, Aug 26, 2007 at 05:04:58PM -0400, Andrew Moise wrote:
>   After being advised that upgrading to more recent firmware fixes
> this problem, I've installed Dell driver update R149666, which
> upgrades the Perc 5i firmware to version v5.1.1-0040.  That seems to
> solve this problem for me even when I boot back into the unmodified
> 2.6.16 kernel.  I therefore believe that this bug should be considered
> a firmware bug instead of a kernel bug, and closed in Debian's BTS.
>   Thanks!

thanks for your feedback!
closing

-- 
maks
 

--- End Message ---

Reply to: