[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#392986: linux-image-2.6.16-2-em64t-p4-smp: megaraid_sas issues warnings and RESETs on Perc 5i in Dell PE 2950



Package: linux-image-2.6.16-2-em64t-p4-smp
Version: 2.6.16-18~bpo.1
Severity: normal

  I feel very bad filing a bug against a backports package, but this
backports package is (according to the changelog) an unmodified
2.6.16-18 package, just recompiled for sarge, and people on the
mailing lists are reporting problems with a variety of OSes [1], so I
have a feeling it's a genuine driver bug with this kernel version.
  In any case, the problem is that under heavy write load, I get
messages like these in /var/log/kern.log:

Oct  2 14:36:01 localhost kernel: sd 0:2:1:0: megasas: RESET -55455 cmd=2a
Oct  2 14:36:01 localhost kernel: megasas: reset successful
Oct  2 14:36:31 localhost kernel: sd 0:2:1:0: megasas: RESET -70369 cmd=2a
Oct  2 14:36:31 localhost kernel: megasas: reset successful
Oct  2 14:37:02 localhost kernel: sd 0:2:1:0: megasas: RESET -83487 cmd=2a
Oct  2 14:37:02 localhost kernel: megasas: reset successful
Oct  2 14:37:32 localhost kernel: sd 0:2:1:0: megasas: RESET -95079 cmd=2a
Oct  2 14:37:32 localhost kernel: megasas: reset successful
Oct  2 14:38:02 localhost kernel: sd 0:2:1:0: megasas: RESET -105361 cmd=2a
Oct  2 14:38:02 localhost kernel: megasas: reset successful
Oct  2 14:38:33 localhost kernel: sd 0:2:1:0: megasas: RESET -115613 cmd=2a
Oct  2 14:38:33 localhost kernel: megasas: reset successful
Oct  2 14:38:33 localhost kernel: sd 0:2:1:0: SCSI error: return code = 0x6000000
Oct  2 14:38:33 localhost kernel: end_request: I/O error, dev sdb, sector 2927091007
Oct  2 14:38:33 localhost kernel: Buffer I/O error on device sdb1, logical block 731772736
Oct  2 14:38:33 localhost kernel: lost page write due to I/O error on sdb1
Oct  2 14:39:03 localhost kernel: sd 0:2:1:0: megasas: RESET -125667 cmd=2a
Oct  2 14:39:03 localhost kernel: megasas: reset successful
Oct  2 14:39:33 localhost kernel: sd 0:2:1:0: megasas: RESET -135588 cmd=2a
Oct  2 14:39:33 localhost kernel: megasas: [ 0]waiting for 1 commands to complete
Oct  2 14:39:34 localhost kernel: megasas: reset successful

  A mailing list posting recommended reducing BLKDEV_MAX_RQ to 8 in
include/linux/blkdev.h as a workaround; I've tried that, and it seems
to work for me.  I suspect that the following patch is the actual fix
(from recent changes to drivers/scsi/megaraid/megaraid_sas.c):

--- a/drivers/scsi/megaraid/megaraid_sas.c 2006-03-20 00:53:29.000000000 -0500
+++ b/drivers/scsi/megaraid/megaraid_sas.c	2006-10-13 12:25:04.000000000 -0400
@@ -1716,6 +1823,12 @@
 	 * Get various operational parameters from status register
 	 */
 	instance->max_fw_cmds = instance->instancet->read_fw_status_reg(reg_set) & 0x00FFFF;
+	/*
+	 * Reduce the max supported cmds by 1. This is to ensure that the
+	 * reply_q_sz (1 more than the max cmd that driver may send)
+	 * does not exceed max cmds that the FW can support
+	 */
+	instance->max_fw_cmds = instance->max_fw_cmds-1;
 	instance->max_num_sge = (instance->instancet->read_fw_status_reg(reg_set) & 0xFF0000) >> 0x10;
 	/*

  ... but, of course, I'm not entirely sure what I'm doing.  This is a
production server now, but I may be able to do some amount of testing
(like installing an etch or unstable partition to test more recent
Debian kernels) from time to time over weekends or during downtime.  If
you'd like me to, let me know and I'll see what I can do.
  I'm also planning to test the above patch after consulting with some
kernel hackers.  I'll let you know how it goes.

[1] http://lists.us.dell.com/pipermail/linux-poweredge/2006-October/027705.html
    http://lkml.org/lkml/2006/9/6/12
    http://lists.us.dell.com/pipermail/linux-poweredge/2006-August/026821.html

-- System Information:
Debian Release: 3.1
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.16+max-nr-req-8
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages linux-image-2.6.16-2-em64t-p4-smp depends on:
ii  e2fsprogs                  1.37-2sarge1  ext2 file system utilities and lib
ii  initramfs-tools [linux-ini 0.80~bpo.1    tools for generating an initramfs
ii  module-init-tools          3.2.2-3~bpo.1 tools for managing Linux kernel mo

-- debconf information:
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/abort-install-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/preinst/bootloader-initrd-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/initrd-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/old-dir-initrd-link-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/postinst/old-initrd-link-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/already-running-this-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/bootloader-test-error-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/depmod-error-initrd-2.6.16-2-em64t-p4-smp: false
  linux-image-2.6.16-2-em64t-p4-smp/postinst/kimage-is-a-directory:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/old-system-map-link-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/prerm/would-invalidate-boot-loader-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/failed-to-move-modules-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/bootloader-error-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/depmod-error-2.6.16-2-em64t-p4-smp: false
  linux-image-2.6.16-2-em64t-p4-smp/preinst/lilo-initrd-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/elilo-initrd-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/overwriting-modules-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/abort-overwrite-2.6.16-2-em64t-p4-smp:
  linux-image-2.6.16-2-em64t-p4-smp/postinst/create-kimage-link-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/prerm/removing-running-kernel-2.6.16-2-em64t-p4-smp: true
  linux-image-2.6.16-2-em64t-p4-smp/preinst/lilo-has-ramdisk:



Reply to: