[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LSI MegaRAID SAS 9240-4i hangs system at boot



On 5/30/2012 4:52 PM, Ramon Hofer wrote:
> On Tue, 29 May 2012 20:49:32 -0500
> Stan Hoeppner <stan@hardwarefreak.com> wrote:
> 
>> On 5/29/2012 7:09 AM, Ramon Hofer wrote:
>>> On Sun, 20 May 2012 21:37:19 -0500
>>> Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>>
>>> (...)
>>>
>>>> Does the mobo BIOS show the disk device?  If not, does the 9240
>>>> BIOS show the disk device, RAID level, and its size?
>>>>
>>>> What we need to figure out is whether this is a BIOS problem at
>>>> this point or a Debian installer kernel driver problem.
>>>
>>> I have finally found some time to work on the problem:
>>>
>>> I set up a raid1 in the hba bios. I couldn't install onto it with
>>> the supermicro mb.
>>>
>>> Then I mounted the lsi hba into my old server with an Asus mb (can't
>>> remember which one it is, must have to check it at home...). It
>>> (almost) works like a charm.
>>> The only issue is that I can't enter the hba BIOS when it's mounted
>>> in the Asus mb. But when I put it back into the Supermicro mb I can
>>> access it again. Very strange!
>>
>> This behavior isn't strange.  Just about every mobo BIOS has an option
>> to ignore or load option ROMs.  On your SuperMicro board this is
>> controlled by the setting "AddOn ROM Display Mode" under the "Boot
>> Feature" menu.  Your ASUS board likely has a similar feature that is
>> currently disabled, preventing the LSI option ROM from being loaded.
> 
> Very interesting! I didn't know that.
> The values I can choose for the "AddOn ROM Display Mode" are
> "Keep current" and "Force Bios". I have chosen the Force Bios option.
> And I have disable the two options you describe below.
> In the supermicro the hba's init screen isn't displayed at all now.
> On the other hand in the asus I saw the init screen when the attached
> discs are listed I just can't enter the configuration program with
> ctrl+h although the message to press these keys is shown.
> 
> I'm now able to boot into the 2.6.32-5 kernel.
> It takes quite a while until the megasas module was loaded (I suppose:
> the over-current messages are shown for a while ~2 mins and then it's
> boot normally until the login prompt.
> When I leave it alone I get the message:
> 
> INFO: task scsi_scan_0:341 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> 
> After booting the first time this evening I installed the bpo 3.2
> kernel.
> When I try to reboot the stable kernel the system hangs after the
> message "Will now restart."
> 
> After a while the above message about the blocked task appears again.
> 
> The bpo kernel 3.2 seems to fail. The two over current-messages are
> shown and then this message:
> http://pastebin.com/raw.php?i=XqVunR9e
> 
> 
> When I load the stable kernel it stop for a while again after the
> over-current message then finally gets to the login prompt. After a
> while I got this message:
> http://pastebin.com/raw.php?i=w409KaFN
> 
> 
>>> But apart from that I could install Debian onto the raid1. Then I
>>> set
>>
>> This was on the ASUS board correct?  Were you able to boot the RAID1
>> device after install?  If so this indeed would be strange as you
>> should not be able to boot from the HBA if its ROM isn't loaded.
> 
> No I wasn't able to boot the kernel installed to the RAID1. Grub was
> loaded but only because I've installed it to the disk directly attached
> to the MB's SATA controller.
> But when choosing the RAID1 kernel it stopped (can't remember the
> message anymore). I thought I haven't set the boot option for the raid1
> in the hba bios properly.
> 
> 
>>> the bios to use the disks as jbods and installed Debian gain to a
>>> drive directly attached to the mb sata controller.
>>> With the original squeeze kernel the disks attached to the hba
>>> weren't visible. But after updating to the bpo kernel I can fdisk
>>> them separately and put it into a raid5 (in the end I want to apply
>>> the 500G partition method Cameleon suggested).
>>
>> This experience with the ASUS board leads me to wonder if disabling
>> the option ROM and INT19 on the SM board would allow everything to
>> function properly.  Try that before you take the board to the dealer
>> for flashing.  Assuming you've deleted any BIOS configured RAID
>> devices in the HBA BIOS already and all drives are configured for
>> JBOD mode, drop the HBA back into the SM board, go into the SM BIOS,
>> set "PCI Slot X Option ROM" to "DISABLED" where X is the number of
>> the PCIe slot in which the LSI HBA is inserted.  Set "Interrupt 19
>> Capture" to "DISABLED".  Save settings and reboot.
>>
>> You should now see the same behavior as on the ASUS, including the HBA
>> BIOS not showing up during the boot process.  Which I'm thinking is
>> the key to it working on the ASUS as the ROM code is never resident.
>> Thus it is not causing problems with kernel driver, which is
>> apparently assuming the 9240 series ROM will not be resident.
> 
> Maybe I wasn't clear about that. The hba BIOS seems to be loaded in the
> asus as well but I just can't enter its setting with ctrl+h.
> 
> Does all of this tell us anything :-?
> 
> 
>> This loading of the option ROM code is what some would consider the
>> difference between "HBA RAID mode" and "HBA JBOD mode".
> 
> Well then it seems as if I want to use Linux software raid I would
> better keep the setting to disable the loading of the option ROM :-/
> 
> 
>>>> Did you already flash the C7P67 BIOS to the latest version?  I
>>>> can't recall.
>>>
>>> I have tried to do that but it was quite strange.
>>> I created a freedos usb stick with unetbootin and copied the files
>>> for the update from supermicro into the stick. I did exactly what
>>> the readmes told me. But when I did it the first time there was no
>>> output of the flash process and the directory where the supermicro
>>> files were located on the stick was empty.
>>> When I tried to do the procedure again it complains that I have to
>>> first install version 1.
>>
>> Unfortunately flashing mobo BIOS is still not always an uneventful nor
>> routine process, even in 2012.
> 
> Yes, I've had issues with both times I tried to do that (now and about
> a year ago with an Intel mainboard) :-(
> Maybe this should tell me something ;-)
> 
> 
>>> I will now bring it to my dealer who can do the BIOS update for me.
>>>
>>> And I will write to Supermicro if they are aware of the issue.
>>
>> Try what I mention above before doing either of these things.
> 
> I've already mailed both of them on Monday.
> 
> The dealer tells me to do anything on my own.
> 
> But Supermicro is very helpful. They described how to flash the bios
> before they knew about the problem I have with the v1.10 that the BIOS
> updater wants me to install first.
> They even attached the zip. Unfortunately it wasn't complete (the
> installer complained about a missing file).
> 
> They're also helping me to install v1.10 but again I can't find a .ROM
> file which I should rename according to their instruction in the mail.
> So I asked again this evening...
> 
> Hopefully I can flash v1.10 to the Supermicro tomorrow and then update
> to the newest version.
> Maybe I then am already able to boot :-)
> Or I try the steps you described about a week ago again and keep the
> load option ROM setting off.
> If this doesn't help neither I will try the newest firmware from lsi
> which has just been released on May 21, 2012.
> 
> Is this a good idea or do you have a better advice?

I'd get the mobo and HBA BIOS to the latest revs.  Then if it still
doesn't work, as I recommended earlier, you need to try another
non-Debian based distro to eliminate the possibility that Debian is
doing something goofy in their kernels.  If neither the latest versions
of SuSE nor Fedora work, then it's clear you have an upstream kernel
issue, or a hardware issue.  Either way, that gives you good information
to present to LSI Support when you contact them.

Ultimately, if anyone is to have the answer to this mystery, it will be
LSI, or upstream kernel devs, as you've performed pretty much all
possible troubleshooting steps of an end user.  You may want to post a
brief description of the problem to the linux-scsi list.  The guys who
wrote and maintain the upstream LSI Linux drivers are on that mailing list.

FWIW, LSI certifies the 9240-4i (all their boards actually) as
compatible with all point releases of Debian 5.x.  They don't have a
compat doc later than Dec 2010 for this board series, so I'm not sure
what their support policy is for Debian 6.

-- 
Stan


Reply to: