Re: [OT] 19"/2U Cases

To: Mike Bird <mgb-debian@yosemite.net>, debian-isp@lists.debian.org
Subject: Re: [OT] 19"/2U Cases
From: Michael Loftis <mloftis@modwest.com>
Date: Wed, 29 Aug 2007 16:37:28 -0600
Message-id: <[🔎] 8C5B136FF6B0449BED449590@dhcp-2-206.wgops.com>
In-reply-to: <[🔎] 200708291454.31723.mgb-debian@yosemite.net>
References: <[🔎] 20070829102614.GL1409@freenet.de> <[🔎] 200708291303.54360.mgb-debian@yosemite.net> <[🔎] E2A15BF150C7AB4F80CCDAA6@dhcp-2-206.wgops.com> <[🔎] 200708291454.31723.mgb-debian@yosemite.net>

--On August 29, 2007 2:54:31 PM -0700 Mike Bird <mgb-debian@yosemite.net>wrote:

On Wednesday 29 August 2007 13:45, Michael Loftis wrote:

MDRAID certainly isn't reliable in a huge number of failure cases.  It
causes the machine to OOPS/lock up.  Or even lose data.  MDRAID is also
very difficult to administer, offering only (depending on your version)
mdadm or raid* tools.  mdadm is rather arcane.  simple operations are not
well documented, like, how do i replace a failed drive?  or start a
rebuild?  there's no 'rebuild drive' it's completely NON automated
either. meaning it always takes user intervention to recover from any
failure.  a single I/O error causes MDRAID to mark the element as
failed.  it does not even bother to retry.  MDRAID is also incapable of
performing background patrolling reads, something i think even 3Ware
does.  MDRAID RAID5 sets are non-bootable.  Something you get from any
hardware raid, evne 3Ware.  My money lately has been on LSI's cards.
3Ware is a good second choice too. the newer ICP* modelled ICP
controllers are shit (as opposed to the pre intel/adaptec GDT* series
which are rock solid).


Software RAID has been reliable for many years.  We use software RAID in
Etch on dozens of systems ranging from small workstations to terrabyte
arrays.  When two drives fail in a RAID 5 you'll lose your data - under
software RAID yes but also under hardware RAID.  There's no need for
a 'rebuild drive' because the rebuild starts when the new drive is added
with a command such as "mdadm -a /dev/md0 /dev/sda".  Simple and fast.

On any hardware raid (atleast with a hotswap chassis) you can remove, andinsert a new drive, live, no intervention, and the RAID takes care ofstarting the rebuild/readding the drive. If you don't have hotswap you canremove and add a new drive, and on the next power up the RAID takes care ofit. Now I might be wrong but Linux AFAIK does not support SATA hotswap onmost controllers. I've seen it mostly work on SCSI systems (you have tomanually rescan the scsi bus usually to get the kernel to update it's listof drives). But on FibreChannel when a loop has an issue, the kernel willtend to mark the loop down, and no amount of coaxing short of a reboot willget that loop back into the up state. Just as recently as this week orlast week on a 2.6.18 kernel MD RAID flipped on a mirror and marked bothdrives bad, when neither had any detectable issue. This caused the machineto OOPS/panic and stop. Neither drive was faulty.

The fact still remains that MDRAID handles errors badly. It doesn't retryreads. Instead it assumes that any failed read means the whole partitionis dead. It then retries on the partner drive. On a SCSI bus if you had amomentary issue this would likely also fail, then two drives/partitions arenow marked bad. any hardware raid will give it another go before markingsomething as bad (and most will log the soft error).


Hardware RAID can only manage entire drives.  For flexibility and
efficiency software RAID manages partitions.  As you gain experience
with RAID, you'll want different filesystems with different RAID
characteristics within a single system.  As a complex but real-world
example: one can have four drives with /boot 4-way mirrored, swap
consisting of two 2-way mirrors, most of the filesystems in RAID-5,
and a large cache (e.g. squid or netflow) in RAID-0 or LVM PVs.
Hardware RAID is much more expensive, and you have to keep a spare
controller (or motherboard) to recover your data when the original
controller (or motherboard) dies.

Many hardware raid controllers support partitioning like this, but it's anadvanced option found only on higher end cards. I haven't seen anyconsumer level RAIDs support this, so you have that one for sure. As faras hardware raid cards failing, in the hundreds of installations, I've seenit once. And that was because of improper handling of the card causing ahairline fracture in the PCB. Motherboards may be a different story, espwith the huge numbers of bad electrolytic caps out there.

I can never recommend any software RAID for anything other than simple
mirrors, and then, always, with the caveat that it will be a bitch to fix
if things go wrong, you probably won't lose data, but getting a software
raid running again is often arcane, especially with MDRAID and it's
frequent inability to correctly identify a failed drive (sometimes the
fault of the SATA controller mind you).  BSDs vinum is little better in
these regards.  And god forbid you lose your boot drive and have
forgotten to keep all the boot blocks on your spare properly updated.
you also have to manually intervene and reorder drives in that case,
something hardware raid, any hardware raid, will transparently cover.


There are many people who know how to manage software RAID systems
without unnecessarily losing data.  Such people will most likely
have actually used software RAID such as MD with persistent
superblocks and will know that there is nothing to the above FUD
unless they deliberately sabotage the default configurations.  It
sounds like you're trying to hand assemble RAIDs from lists of
drive partitions rather than using UUIDs.  LILO automatically updates
mirrors.  With the current Grub one should grub-install each mirror.

I'd love to see any documentation on any of this. The problems I'vedescribed are real world. Despite that we still have a lot of MD softwaremirrors in production as as long as they're working they're cheaper. Theytake a lot more effort to make work right though. We use persistentsuperblocks, and it doesn't alleviate any of these issues. Theinstallations are of about four major 'flavors'. RedHat9, FC3, Debian 3.0and 3.1, and Debian 4.0. And none are immune to the issues. Debian 3 waspretty bad sometimes not making it past the initrd when a drive failed.The MD setups were all done using the normal TUI (anaconda, debian's systeminstaller) tools during installation.

And grub-installing on both drives isn't as simple as it sounds, because itonly works right if their geometry matches. the grub installer isn't smartenough to figure out if things don't match. typing grub-install to installa boot block on hd1 (sdb, hdb, whatever it really is) won't necessarilygive you a bootable hd1, because if your grub config references hd0partitions, and they're different than hd1 in some way, it won't make it tostage 2/2.5, and thus no command prompt. This atleast has always been myexperience, even with etch.

The other issue is that no bios i know of will handle if the boot drivefails in some way that doesn't leave it simply not showing up. and most ofthe time they tend to fail in ways that leave them showing up to the bios,but are actually unusable. a hardware raid solves this.

A related issue to that is the fact that most PC BIOS' have a pretty sadserial console support. This means that failures will more often requireonsite visits if a reboot (for whatever reason) happens after a boot drivefailure but before you can get a tech on site. This is a definite issue ifyou're deploying systems in locations remote to your own, or with difficultaccess.

Software RAID has caveats, it's not perfect. Hardware RAID has caveats,it's not perfect. Having seen far more issues in the real world withsoftware RAIDs than with hardware RAIDs puts me pretty squarely in thehardware RAID camp.

Software RAID is undoubtedly cheaper for initial investment cost. But inour experience (Modwest) can cost significantly more when it fails due toundetected errors, poor error recovery behavior (sometimes not the fault ofsoftware RAID, many IDE, SATA, and even some SCSI drivers and controllersjust do not behave very well when a drive isn't responding properly). Itrequires significantly more experience and know-how to properly manage andrecover from an error. Hardware RAID boils down to 'which drive failed?''replace with same or larger drive' and you're done. This can be done bysomeone with only minimal experience and no unix experience at all. Wherehardware RAID does lose is cost and flexibility. You do have less choiceas to exactly how to manage/maintain your data with hardware RAID. Manypeople do not need that much flexibility.

I am not FUDing as you put it, I am making known my objections andexperience with software RAID. Many people don't have any issues withsoftware RAID. And a software RAID compared to a cheap bottom barrelhardware RAID will usually be faster especially when you want to do RAID5and RAID6, and possibly more reliable, and will certainly have more bellsand whistles.

Reply to:

References:
- [OT] 19"/2U Cases
  - From: Michelle Konzack <linux4michelle@freenet.de>
- Re: [OT] 19"/2U Cases
  - From: Mike Bird <mgb-debian@yosemite.net>
- Re: [OT] 19"/2U Cases
  - From: Michael Loftis <mloftis@modwest.com>
- Re: [OT] 19"/2U Cases
  - From: Mike Bird <mgb-debian@yosemite.net>

Prev by Date: Re: [OT] 19"/2U Cases
Next by Date: Re: [OT] 19"/2U Cases
Previous by thread: Re: [OT] 19"/2U Cases
Next by thread: Re: [OT] 19"/2U Cases
Index(es):
- Date
- Thread