Re: QLogic PTI firmware
- To: firstname.lastname@example.org
- Subject: Re: QLogic PTI firmware
- From: Mark Morgan Lloyd <markMLl.email@example.com>
- Date: Sun, 03 Feb 2013 15:26:46 +0000
- Message-id: <[🔎] firstname.lastname@example.org>
- In-reply-to: <email@example.com>
- References: <firstname.lastname@example.org> <email@example.com> <alpine.LNX.firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com>
Mark Morgan Lloyd wrote:
Mark Morgan Lloyd wrote:
What's acceptable cabling practice on this: it's been set up hung off
a single controller with the two halves daisy-chained. Cables are Sun
or (decent) IBM and it's a Sun differential terminator, I see no
failures if the job count is <=4 (but I continue testing this- it's
useful extra heat).
I think it's worth noting explicitly that there's 12x CPUs in this
system. I note (but don't see as directly relevant) that the
controller won't load the firmware during startup, has to be done by
a manual rmmod/modprobe.
Chris, I think some thing from you vanished into the spambin at about
20:30. Please could you resend it to the address below.
"My previous message said that best practice with SBUS is to use only
half of the D1000 per controller channel. 6 fast drives is about the max
that you can expect the bus speed limited controller to handle without
congestion under heavy loads."
I can cope with limited performance, there's times when having plenty of
slots into which arbitrary drives can be plugged (e.g. to fix a dud
SILO) can be really useful. Having said which, I note that the
A1000/D1000 "Just The Facts" explicitly shows the possibility of having
both halves of the box connected to a single host controller.
..although that illustration was to an unidentified controller on an
"OTOH, it seems that Linux may not be handling congestion as gracefully
Indeed. In fact, it doesn't appear to be "picking up the pieces"
printk(KERN_EMERG "qlogicpti%d: request queue overflow\n",
/* Unfortunately, unless you use the new EH code, which
* we don't, the midlayer will ignore the return value,
* which is insane. We pick up the pieces like this.
Cmnd->result = DID_BUS_BUSY;
I'm still working on it to see if I can track it down to a single drive
or a particular slot in the rack.
Patrick, thanks for your comment about the firmware being at
linux-2.6/firmware/qlogic/isp1000.bin.hex in the standard (i.e.
After much testing, I've tracked the problem down to two Sun/Fujitsu
18.2Gb drives which will kill the entire system fairly promptly if the
qlogicpti module's brought up with them in certain slots, even if there
are only 6x drives in the array rather than the full 12x. I speculate
that there's a problem with SCSI address decoding or similar on the
problematic SCA drives.
With these quarantined and replaced by known-good drives to take the
array to its full complement of 12x, I can run any combination of up to
10x drives reliably in the array but not the full 12x: trying to do so
still causes an eventual kernel panic. Pulling half the CPUs in a crude
attempt to reduce concurrency doesn't improve things. The impression I
get is that that controller (and/or its supporting firmware and Linux
driver) isn't up to handling a full string of 12x drives with a heavy
The test I'm using is to write random data to the start of each drive,
then to dd this in blocks of approx 256M to the remainder.
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk
[Opinions above are the author's, not those of his employers or colleagues]