[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LSI MegaRAID SAS 9240-4i hangs system at boot



On Tue, 12 Jun 2012 17:30:43 -0500
Stan Hoeppner <stan@hardwarefreak.com> wrote:

> On 6/12/2012 8:40 AM, Ramon Hofer wrote:
> > On Sun, 10 Jun 2012 17:30:08 -0500
> > Stan Hoeppner <stan@hardwarefreak.com> wrote:
> 
> >> Try the Wheezy installer.  Try OpenSuSE.  Try Fedora.  If any of
> >> these work without lockup we know the problem is Debian 6.
> >> However...
> > 
> > I didn't do this because it the LSI worked with the Asus mobo and
> > Debian squeeze. And because I couldn't install OpenSuSE nor Fedora.
> > But I will give it another try...
> 
> Your problem may involve more than just the two variables.  The
> problem may be mobo+LSI+distro_kernel, not just mobo+LSI.  This is
> why I suggested trying to install other distros.

Aha, this is true - didn't think about this...


> >> Please call LSI support before you attempt any additional
> >> BIOS/firmware updates.
> 
> Note I stated "call".  You're likely to get more/better
> information/assistance speaking to a live person.

I didn't have enough confidence in my oral english :-(


> > It sounds like the issue is related to the bootstrap, so either to
> > resolve the issue you will have to free up the option ROM space or
> > limit the number of devices during POST."
> 
> This is incorrect advice, as it occurs with the LSI BIOS both enabled
> and disabled.  Apparently you didn't convey this in your email.

I will write it to them again.
But to be honest I think I'll leave the Supermicro and use it for my
Desktop.


(...) 

> > Nono, I was aware that I can have several RAID arrays.
> > My initial plan was to use four disks with the same size and have
> > several RAID5 devices. 
> 
> This is what you should do.  I usually recommend RAID10 for many
> reasons, but I'm guessing you need more than half of your raw storage
> space.  RAID10 eats 1/2 of your disks for redundancy.  It also has the
> best performance by far, and the lowest rebuild times by far.  RAID5
> eats 1 disk for redundancy, RAID6 eats 2.  Both are very slow compared
> to RAID10, and both have long rebuild times which increase severely as
> the number of drives in the array increases.  The drive rebuild time
> for RAID10 is the same whether your array has 4 disks or 40 disks.

Yes, I think for me raid5 is sufficient. I don't need extreme
performance nor extreme security. I just hope that the raid5 setup will
be enough safe :-)


> If you're more concerned with double drive failure during rebuild (not
> RESHAPE as you stated) than usable space, make 4 drive RAID10 arrays
> or 4 drive RAID6s, again, without partitions, using the command
> examples I provided as a guide.

Well this is just multimedia data stored on this server. So if I loose
it it won't kill me :-)


> > Is there some documentation why partitions aren't good to use?
> > I'd like to learn more :-)
> 
> Building md arrays from partitions on disks is a means to an end.  Do
> you have an end that requires these means?  If not, don't use
> partitions.  The biggest reason to NOT use partitions is misalignment
> on advanced format drives.  The partitioning utilities shipped with
> Squeeze, AFAIK, don't do automatic alignment on AF drives.

Ok, I was just confused because most the tutorials (or at least most of
the ones I found) use partitions over the whole disk...


> If you misalign the partitions, RAID5/6 performance will drop by a
> factor of 4, or more, during RMW operations, i.e. modifying a file or
> directory metadata.  The latter case is where you really take the
> performance hit as metadata is modified so frequently.  Creating md
> arrays from bare AF disks avoids partition misalignment.

So if I can make things simpler I'm happy :-)


> > Does it work as well with hw RAID devices from the LSI card?
> 
> Your LSI card is an HBA with full RAID functions.  It is not however a
> full blown RAID card--its ASIC is much lower performance and it has no
> cache memory.  For RAID1/10 it's probably a toss up at low disk counts
> (4-8).  At higher disk counts, or with parity RAID, md will be faster.
> But given your target workloads you'll likely not notice a difference.

You're right.
I just had the impression that you'd suggested that I'd use the hw raid
capability of the lsi at the beginning of this conversation.


> >> Then make a write aligned XFS filesystem on this linear device:
> >>
> >> ~$ mkfs.xfs -d agcount=11 su=131072,sw=3 /dev/md2
> > 
> > Are there similar options for jfs?
> 
> Dunno.  Never used as XFS is superior in every way.  JFS hasn't seen a
> feature release since 2004.  It's been in bug fix only mode for 8
> years now.  XFS has a development team of about 30 people working at
> all the major Linux distros, SGI, and IBM, yes, IBM.  It has seen
> constant development since it's initial release on IRIX in 1994 and
> port to Linux in the early 2000s.

I must have read outdated wikis (mostly from the mythtv project).


> > Especially because I read in wikipedia that xfs is
> > integrated in the kernel and to use jfs one has to install
> > additional packages.
> 
> You must have misread something.  The JFS driver was still in mainline
> as of 3.2.6, and I'm sure it's still in 3.4 though I've not confirmed
> it.  So you can build JFS right into your kernel, or as a module.  I'd
> never use it, nor recommend it, I'm just squaring the record.

I found this information in the german wikipedia
(http://de.wikipedia.org/wiki/XFS_%28Dateisystem%29):

"... Seit Kernel-Version 2.6 ist es offizieller Bestandteil des
Kernels. ..."

Translated: Since kernel version 2.6 it's an official part of the
kernel.

Maybe I misunderstood this sentence in what the writer meant or maybe
it's even wrong what they wrote in the first place :-?


> > Btw it seems very complicated with all the allocation groups, stripe
> > units and stripe width.
> 
> Powerful flexibility is often accompanied by a steep learning curve.

True :-)


> > How do you calculate these number?
> 
> Beginning users don't.  You use the defaults.  You are confused right
> now because I lifted the lid and you got a peek inside more advanced
> configuations.  Reading the '-d' section of 'man mkfs.xfs' tells you
> how to calculate sunit/swidth, su/sw for different array types and
> chunk sizes.

Ok if I read it right it divides the array into 11 allocation groups,
with 131072 byte blocks and 3 stripe units as stripe width.
But where do you know what numbers to use?
Maybe I didn't read the man carefully enough then I'd like to
appologize :-)


> Please read the following very carefully.  IF you did not want a
> single filesystem space across both 4 disk arrays, and the future 12
> disks you may install in that chassis, you CAN format each md array
> with its own XFS filesystem using the defaults.  In this case,
> mkfs.xfs will read the md geometry and create the array with all the
> correct parameters--automatically.  So there's nothing to calculate,
> no confusion.
> 
> However, you don't want 2 or 6 separate filesystems mounted as
> something like:
> 
> /data1
> ...
> /data6
> 
> in your root directory.  You want one big filesystem mounted in your
> root as something like '/data' to create subdirs and put files in,
> without worrying about how much space you have left in each of 6
> filesystems/arrays.  Correct?

Yes, this is very handy :-)


> The advanced configuration I previously gave you allows for one large
> XFS across all your arrays.  mkfs.xfs is not able to map out the
> complex storage geometry of nested arrays automatically, which is why
> I lifted the lid and showed you the advanced configuration.

Ok, this is very nice!
But will it also work for any disk size (1.5, 2 and 3 TB drives)?


> With it you'll get a minimum filesystem bandwidth of ~300MB/s per
> single file IO and a maximum of ~600MB/s with 2 or more parallel file
> IOs, with two 4-drive arrays.  Each additional 4 drive RAID5 array
> grown into the md linear array and then into XFS will add ~300MB/s of
> parallel file bandwidth, up to a maximum of ~1.5GB/s.  This should
> far exceed your needs.

This really is enough for my needs :-)

> > And why do both arrays have a stripe width of 384 KB?
> 
> You already know the answer.  You should anyway:
> 
> chunk size            = 128KB

This is what I don't know.
Is this a characteristic of the disk?


> RAID level            = 5
> No. of disks          = 4
> ((4-1)=3)) * 128KB    = 384KB

This is traceable.


> > Is it also true that I will get better performance with two hw RAID5
> > arrays?
> 
> Assuming for a moment your drives will work in RAID mode with the
> 9240, which they won't, the answer is no.  Why?  Your CPU cores are
> far faster than the ASIC on the 9240, and the board has no battery
> backed cache RAM to offload write barriers.
> 
> If you step up to one of the higher end full up RAID boards with BBWC,
> and the required enterprise drives, then the answer would be yes up to
> the 20 drives your chassis can hold.   As you increase the drive
> count, at some point md RAID will overtake any hardware RAID card, as
> the 533-800MHz single/dual core RAID ASIC just can't keep up with the
> cores in the host CPU.

Very interesting!


> > What if I loose a complete raid5 array which was part of the linear
> > raid array? Will I loose the whole content from the linear array as
> > I would with lvm?
> 
> Answer1:  Are you planning on losing an entire RAID5 array?  Planning,
> proper design, and proper sparing prevents this.  If you lose a drive,
> replace it and rebuild IMMEDIATELY.  Keep a spare drive on hand, or
> better yet in standby.  Want to eliminate this scenario?  Use RAID10
> or RAID6, and live with the lost drive space.  And still
> replace/rebuild a dead drive immediately.
> 
> Answer2:  It depends.  If this were to happen, XFS will automatically
> unmount the filesystem.  At that point you run xfs_repair.  If the
> array that died contained the superblock and AG0 you've probably lost
> everything.  If it did not, the repair may simply shrink the
> filesystem and repair any damaged inodes, leaving you with whatever
> was stored on the healthy RAID5 array.

This sounds suitable for my needs.
Just another question: The linear raid will distribute the data to the
containing raid5 arrays?
Or will it fill up the first one and continue with the second and so on?


> > I'm still aware that 3 TB raid5 rebuilds take long. 
> 
> 3TB drive rebuilds take forever, period.  As I mentioned, it takes ~8
> hours to rebuild a mirror.
> 
> > Nevertheless I think
> > I will risk using normal (non-green) disks for the next expansion.
> 
> What risk?  Using 'normal' drives will tend to reduce RAID related
> green drive problems.

Ok, I will use normal drives in the future and hope that the green
drives wont give up at the same time :-/


> > If I'm informed correctly there are not only green drives and normal
> > desktop drives but also server disks with a higher quality than
> > desktop disks.
> 
> Yes, and higher performance.  They're called "enterprise" drives.
> There are many enterprise models: 7.2K SATA/SAS, 10K SATA/SAS, 15K
> SAS, 2.5" and 3.5"
> 
> > But still I don't want to "waste" energy. 
> 
> Manufacturing a single drive consumes as much energy as 4 drives
> running for 3 years.  Green type drives tend to last half as long due
> to all the stop/start cycles wearing out the spindle bearings.  Do
> the math.  The net energy consumption of 'green' drives is therefore
> equal to or higher than 'normal' drives.  The only difference is that
> a greater amount of power is consumed by the drive before you even
> buy it.  The same analysis is true of CFL bulbs.  They consume more
> total energy through their life cycle than incandescents.

Hmm, I knew that for hybrid cars but never thought about this for
hdds.


> > Would the Seagate Barracuda
> > 3TB disks be a better choise?
> 
> Is your 10.5TB full already?  You don't even have the system running
> yet...

No, but I like living in the future ;-)


> > My needs are probably *much* less demanding than yours.
> > Usually it only has to do read access to the files. Aditionally
> > copying bluray rips to it. But most of the the it sits around doing
> > nothing (the raid). MythTV records almost most of the time but to a
> > non RAID disk.
> > So I hope with non-green 3 TB disks I can get some security from the
> > redundancy and still get a lot of disk space.
> 
> If you have a good working UPS, good airflow (that case does), and
> decent quality drives, you shouldn't have to worry much.  I'm unsure
> of the quality of the 3TB Barracuda, haven't read enough about it.
> 
> Are you planning on replacing all your current drives with 4x 3TB
> drives?  Or going with the linear over RAID5 architecture I
> recommended, and adding 4x 3TB drives into the mix?

I'm planning to keep the drives I have now and add 4x 3TB into the mix.


> > This was exactly what I had in mind at the first place. But the
> > suggestion from Cameleon was so tempting :-)
> 
> Cameleon helps many people with many Debian/Linux issues and is very
> knowledgeable in many areas.  But I don't recall anyone accusing her
> of being a storage architect. ;)

Her suggestion seemed very tempting because it would give me a raid6
without having to loose too much storage space.
She really knows a lot so I was just happy with her suggesting me this
setup.


> > Btw I have another question:
> > Is it possible to attach the single (non raid) disk I now have in
> > my old server for the mythtv recordings to the LSI controller and
> > still have access to the content when it's configured as jbod?
> > Since there are recordings which it wouldn't be very bad if I loose
> > them I'd like to avoid backing this up.
> 
> Drop it in a drive sled, plug it into the backplane, and find out.  If
> you configure it for JBOD the LSI shouldn't attempt writing any
> metadata to it.

Ok, thanks I will do that :-)


Again thanks alot for all your help and your patience with me.
Certainly not always easy ;-)


Cheers
Ramon


Reply to: