Re: QLogic PTI firmware

To: debian-sparc@lists.debian.org
Subject: Re: QLogic PTI firmware
From: Mark Morgan Lloyd <markMLl.debian-sparc@telemetry.co.uk>
Date: Sun, 03 Feb 2013 15:26:46 +0000
Message-id: <[🔎] kelvjo$1nh$1@pye-srv-01.telemetry.co.uk>
In-reply-to: <kdlvsp$dpo$1@pye-srv-01.telemetry.co.uk>
References: <kdj2oj$ih4$1@pye-srv-01.telemetry.co.uk> <kdjrhm$6qc$1@pye-srv-01.telemetry.co.uk> <alpine.LNX.2.00.1301211711490.21331@post.netunix.com> <kdjvko$fhd$1@pye-srv-01.telemetry.co.uk> <kdls21$6ff$1@pye-srv-01.telemetry.co.uk> <kdlvsp$dpo$1@pye-srv-01.telemetry.co.uk>

Mark Morgan Lloyd wrote:

Mark Morgan Lloyd wrote:
What's acceptable cabling practice on this: it's been set up hung offa single controller with the two halves daisy-chained. Cables are Sunor (decent) IBM and it's a Sun differential terminator, I see nofailures if the job count is <=4 (but I continue testing this- it'suseful extra heat).
I think it's worth noting explicitly that there's 12x CPUs in thissystem. I note (but don't see as directly relevant) that thecontroller won't load the firmware during startup, has to be done bya manual rmmod/modprobe.
Chris, I think some thing from you vanished into the spambin at about20:30. Please could you resend it to the address below.
"My previous message said that best practice with SBUS is to use onlyhalf of the D1000 per controller channel. 6 fast drives is about the maxthat you can expect the bus speed limited controller to handle withoutcongestion under heavy loads."
I can cope with limited performance, there's times when having plenty ofslots into which arbitrary drives can be plugged (e.g. to fix a dudSILO) can be really useful. Having said which, I note that theA1000/D1000 "Just The Facts" explicitly shows the possibility of havingboth halves of the box connected to a single host controller.

..although that illustration was to an unidentified controller on anUltra-60.

"OTOH, it seems that Linux may not be handling congestion as gracefullyas Solaris."
Indeed. In fact, it doesn't appear to be "picking up the pieces"particularly successfully.
toss_command:
        printk(KERN_EMERG "qlogicpti%d: request queue overflow\n",
               qpti->qpti_id);

        /* Unfortunately, unless you use the new EH code, which
         * we don't, the midlayer will ignore the return value,
         * which is insane.  We pick up the pieces like this.
         */
        Cmnd->result = DID_BUS_BUSY;
        done(Cmnd);
        return 1;
}
I'm still working on it to see if I can track it down to a single driveor a particular slot in the rack.
Patrick, thanks for your comment about the firmware being atlinux-2.6/firmware/qlogic/isp1000.bin.hex in the standard (i.e.non-Debian) kernel.

After much testing, I've tracked the problem down to two Sun/Fujitsu18.2Gb drives which will kill the entire system fairly promptly if theqlogicpti module's brought up with them in certain slots, even if thereare only 6x drives in the array rather than the full 12x. I speculatethat there's a problem with SCSI address decoding or similar on theproblematic SCA drives.

With these quarantined and replaced by known-good drives to take thearray to its full complement of 12x, I can run any combination of up to10x drives reliably in the array but not the full 12x: trying to do sostill causes an eventual kernel panic. Pulling half the CPUs in a crudeattempt to reduce concurrency doesn't improve things. The impression Iget is that that controller (and/or its supporting firmware and Linuxdriver) isn't up to handling a full string of 12x drives with a heavyworkload.

The test I'm using is to write random data to the start of each drive,then to dd this in blocks of approx 256M to the remainder.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]

Reply to:

Prev by Date: xorg video modes question
Next by Date: Re: xorg video modes question
Previous by thread: Re: xorg video modes question
Next by thread: Re: [PPL-devel] ppl-1.0 tests fail to build on s390/s390x
Index(es):
- Date
- Thread