[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [OT] 19"/2U Cases





--On August 29, 2007 2:54:31 PM -0700 Mike Bird <mgb-debian@yosemite.net> wrote:

On Wednesday 29 August 2007 13:45, Michael Loftis wrote:
MDRAID certainly isn't reliable in a huge number of failure cases.  It
causes the machine to OOPS/lock up.  Or even lose data.  MDRAID is also
very difficult to administer, offering only (depending on your version)
mdadm or raid* tools.  mdadm is rather arcane.  simple operations are not
well documented, like, how do i replace a failed drive?  or start a
rebuild?  there's no 'rebuild drive' it's completely NON automated
either. meaning it always takes user intervention to recover from any
failure.  a single I/O error causes MDRAID to mark the element as
failed.  it does not even bother to retry.  MDRAID is also incapable of
performing background patrolling reads, something i think even 3Ware
does.  MDRAID RAID5 sets are non-bootable.  Something you get from any
hardware raid, evne 3Ware.  My money lately has been on LSI's cards.
3Ware is a good second choice too. the newer ICP* modelled ICP
controllers are shit (as opposed to the pre intel/adaptec GDT* series
which are rock solid).

Software RAID has been reliable for many years.  We use software RAID in
Etch on dozens of systems ranging from small workstations to terrabyte
arrays.  When two drives fail in a RAID 5 you'll lose your data - under
software RAID yes but also under hardware RAID.  There's no need for
a 'rebuild drive' because the rebuild starts when the new drive is added
with a command such as "mdadm -a /dev/md0 /dev/sda".  Simple and fast.

On any hardware raid (atleast with a hotswap chassis) you can remove, and insert a new drive, live, no intervention, and the RAID takes care of starting the rebuild/readding the drive. If you don't have hotswap you can remove and add a new drive, and on the next power up the RAID takes care of it. Now I might be wrong but Linux AFAIK does not support SATA hotswap on most controllers. I've seen it mostly work on SCSI systems (you have to manually rescan the scsi bus usually to get the kernel to update it's list of drives). But on FibreChannel when a loop has an issue, the kernel will tend to mark the loop down, and no amount of coaxing short of a reboot will get that loop back into the up state. Just as recently as this week or last week on a 2.6.18 kernel MD RAID flipped on a mirror and marked both drives bad, when neither had any detectable issue. This caused the machine to OOPS/panic and stop. Neither drive was faulty.

The fact still remains that MDRAID handles errors badly. It doesn't retry reads. Instead it assumes that any failed read means the whole partition is dead. It then retries on the partner drive. On a SCSI bus if you had a momentary issue this would likely also fail, then two drives/partitions are now marked bad. any hardware raid will give it another go before marking something as bad (and most will log the soft error).


Hardware RAID can only manage entire drives.  For flexibility and
efficiency software RAID manages partitions.  As you gain experience
with RAID, you'll want different filesystems with different RAID
characteristics within a single system.  As a complex but real-world
example: one can have four drives with /boot 4-way mirrored, swap
consisting of two 2-way mirrors, most of the filesystems in RAID-5,
and a large cache (e.g. squid or netflow) in RAID-0 or LVM PVs.
Hardware RAID is much more expensive, and you have to keep a spare
controller (or motherboard) to recover your data when the original
controller (or motherboard) dies.

Many hardware raid controllers support partitioning like this, but it's an advanced option found only on higher end cards. I haven't seen any consumer level RAIDs support this, so you have that one for sure. As far as hardware raid cards failing, in the hundreds of installations, I've seen it once. And that was because of improper handling of the card causing a hairline fracture in the PCB. Motherboards may be a different story, esp with the huge numbers of bad electrolytic caps out there.


I can never recommend any software RAID for anything other than simple
mirrors, and then, always, with the caveat that it will be a bitch to fix
if things go wrong, you probably won't lose data, but getting a software
raid running again is often arcane, especially with MDRAID and it's
frequent inability to correctly identify a failed drive (sometimes the
fault of the SATA controller mind you).  BSDs vinum is little better in
these regards.  And god forbid you lose your boot drive and have
forgotten to keep all the boot blocks on your spare properly updated.
you also have to manually intervene and reorder drives in that case,
something hardware raid, any hardware raid, will transparently cover.

There are many people who know how to manage software RAID systems
without unnecessarily losing data.  Such people will most likely
have actually used software RAID such as MD with persistent
superblocks and will know that there is nothing to the above FUD
unless they deliberately sabotage the default configurations.  It
sounds like you're trying to hand assemble RAIDs from lists of
drive partitions rather than using UUIDs.  LILO automatically updates
mirrors.  With the current Grub one should grub-install each mirror.

I'd love to see any documentation on any of this. The problems I've described are real world. Despite that we still have a lot of MD software mirrors in production as as long as they're working they're cheaper. They take a lot more effort to make work right though. We use persistent superblocks, and it doesn't alleviate any of these issues. The installations are of about four major 'flavors'. RedHat9, FC3, Debian 3.0 and 3.1, and Debian 4.0. And none are immune to the issues. Debian 3 was pretty bad sometimes not making it past the initrd when a drive failed. The MD setups were all done using the normal TUI (anaconda, debian's system installer) tools during installation.

And grub-installing on both drives isn't as simple as it sounds, because it only works right if their geometry matches. the grub installer isn't smart enough to figure out if things don't match. typing grub-install to install a boot block on hd1 (sdb, hdb, whatever it really is) won't necessarily give you a bootable hd1, because if your grub config references hd0 partitions, and they're different than hd1 in some way, it won't make it to stage 2/2.5, and thus no command prompt. This atleast has always been my experience, even with etch.

The other issue is that no bios i know of will handle if the boot drive fails in some way that doesn't leave it simply not showing up. and most of the time they tend to fail in ways that leave them showing up to the bios, but are actually unusable. a hardware raid solves this.

A related issue to that is the fact that most PC BIOS' have a pretty sad serial console support. This means that failures will more often require onsite visits if a reboot (for whatever reason) happens after a boot drive failure but before you can get a tech on site. This is a definite issue if you're deploying systems in locations remote to your own, or with difficult access.

Software RAID has caveats, it's not perfect. Hardware RAID has caveats, it's not perfect. Having seen far more issues in the real world with software RAIDs than with hardware RAIDs puts me pretty squarely in the hardware RAID camp.

Software RAID is undoubtedly cheaper for initial investment cost. But in our experience (Modwest) can cost significantly more when it fails due to undetected errors, poor error recovery behavior (sometimes not the fault of software RAID, many IDE, SATA, and even some SCSI drivers and controllers just do not behave very well when a drive isn't responding properly). It requires significantly more experience and know-how to properly manage and recover from an error. Hardware RAID boils down to 'which drive failed?' 'replace with same or larger drive' and you're done. This can be done by someone with only minimal experience and no unix experience at all. Where hardware RAID does lose is cost and flexibility. You do have less choice as to exactly how to manage/maintain your data with hardware RAID. Many people do not need that much flexibility.

I am not FUDing as you put it, I am making known my objections and experience with software RAID. Many people don't have any issues with software RAID. And a software RAID compared to a cheap bottom barrel hardware RAID will usually be faster especially when you want to do RAID5 and RAID6, and possibly more reliable, and will certainly have more bells and whistles.



Reply to: