[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LSI MegaRAID SAS 9240-4i hangs system at boot



On Sun, 20 May 2012, Stan Hoeppner wrote:
> On 5/19/2012 11:05 AM, Henrique de Moraes Holschuh wrote:
> > On Sat, 19 May 2012, Ramon Hofer wrote:
> >>>> And after a while there are more messages which I don't understand. I
> >>>> have taken a picture:
> >>>> http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
> >>>
> >>> It shows that udev is having serious trouble handling one of the USB
> >>> devices.
> >>
> >> Yes but only when the lsi card is attached. When it's removed the 
> > 
> > Get a better PSU, and if that doesn't work, either junk the motherboard, or
> > give up on adding any cards that require a bit more power.
> 
> This is absolutely horrible advice.  Any moderate horsepower PCIe x16

Well, yes. But mostly because I didn't add the proper "but first check if
you cannot supply extra power using MOLEX connectors".  I apologise for that
one.

I *have* been through oversubscribed power rails due to el-cheap-o PSUs and
onboard (motherboard) voltage regulators before, as well as due to
undersized PSUs (in servers), and I've also been through overload scenarios
caused by bad memory modules, and a bad keyboard (which had developed low
resistance paths akin to very small short-circuits).  The system goes
slightly insane, all sort of weird defects show up, INCLUDING tripping the
overcurrent detector on the root USB hub due to +5V floating too much, etc.

> these SAS boards.  Too much PCIe power draw isn't the issue here, unless
> the mobo is possibly defective.  I doubt this is the case.  It's most

Or the PSU can't supply enough power to whichever rail the onboard VRs are
using to supply the PCIe slots and the chipset (might not be the 3.3/5V
ones, some boards prefer to do it using the 12V rail and a DC-DC VR).

> likely a firmware bug in the HBA or the system BIOS, or a driver bug in
> 2.6.32, or a combination of these.  We should know after Ramon runs
> through the task list I provided earlier.

AFAIK, the only kernel bug that could cause overcurrent misdetects is a
problem on interrupt sharing, which should not be possible in a modern board
where everything PCIe uses MSI/MSI-X (the Linux USB core is still incapable
of using MSI/MSI-X, at least up to kernel 3.2)... or memory corruption,
which is less deterministic.

Firmware bugs in SMM code can cause just about anything, but it seems
unlikely they'd mess with the overcurrent alarm report bits in the USB
chipset because of a disk controller.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


Reply to: