Re: [OT] 19"/2U Cases
--On August 29, 2007 2:54:31 PM -0700 Mike Bird <email@example.com>
On Wednesday 29 August 2007 13:45, Michael Loftis wrote:
MDRAID certainly isn't reliable in a huge number of failure cases. It
causes the machine to OOPS/lock up. Or even lose data. MDRAID is also
very difficult to administer, offering only (depending on your version)
mdadm or raid* tools. mdadm is rather arcane. simple operations are not
well documented, like, how do i replace a failed drive? or start a
rebuild? there's no 'rebuild drive' it's completely NON automated
either. meaning it always takes user intervention to recover from any
failure. a single I/O error causes MDRAID to mark the element as
failed. it does not even bother to retry. MDRAID is also incapable of
performing background patrolling reads, something i think even 3Ware
does. MDRAID RAID5 sets are non-bootable. Something you get from any
hardware raid, evne 3Ware. My money lately has been on LSI's cards.
3Ware is a good second choice too. the newer ICP* modelled ICP
controllers are shit (as opposed to the pre intel/adaptec GDT* series
which are rock solid).
Software RAID has been reliable for many years. We use software RAID in
Etch on dozens of systems ranging from small workstations to terrabyte
arrays. When two drives fail in a RAID 5 you'll lose your data - under
software RAID yes but also under hardware RAID. There's no need for
a 'rebuild drive' because the rebuild starts when the new drive is added
with a command such as "mdadm -a /dev/md0 /dev/sda". Simple and fast.
On any hardware raid (atleast with a hotswap chassis) you can remove, and
insert a new drive, live, no intervention, and the RAID takes care of
starting the rebuild/readding the drive. If you don't have hotswap you can
remove and add a new drive, and on the next power up the RAID takes care of
it. Now I might be wrong but Linux AFAIK does not support SATA hotswap on
most controllers. I've seen it mostly work on SCSI systems (you have to
manually rescan the scsi bus usually to get the kernel to update it's list
of drives). But on FibreChannel when a loop has an issue, the kernel will
tend to mark the loop down, and no amount of coaxing short of a reboot will
get that loop back into the up state. Just as recently as this week or
last week on a 2.6.18 kernel MD RAID flipped on a mirror and marked both
drives bad, when neither had any detectable issue. This caused the machine
to OOPS/panic and stop. Neither drive was faulty.
The fact still remains that MDRAID handles errors badly. It doesn't retry
reads. Instead it assumes that any failed read means the whole partition
is dead. It then retries on the partner drive. On a SCSI bus if you had a
momentary issue this would likely also fail, then two drives/partitions are
now marked bad. any hardware raid will give it another go before marking
something as bad (and most will log the soft error).
Hardware RAID can only manage entire drives. For flexibility and
efficiency software RAID manages partitions. As you gain experience
with RAID, you'll want different filesystems with different RAID
characteristics within a single system. As a complex but real-world
example: one can have four drives with /boot 4-way mirrored, swap
consisting of two 2-way mirrors, most of the filesystems in RAID-5,
and a large cache (e.g. squid or netflow) in RAID-0 or LVM PVs.
Hardware RAID is much more expensive, and you have to keep a spare
controller (or motherboard) to recover your data when the original
controller (or motherboard) dies.
Many hardware raid controllers support partitioning like this, but it's an
advanced option found only on higher end cards. I haven't seen any
consumer level RAIDs support this, so you have that one for sure. As far
as hardware raid cards failing, in the hundreds of installations, I've seen
it once. And that was because of improper handling of the card causing a
hairline fracture in the PCB. Motherboards may be a different story, esp
with the huge numbers of bad electrolytic caps out there.
I can never recommend any software RAID for anything other than simple
mirrors, and then, always, with the caveat that it will be a bitch to fix
if things go wrong, you probably won't lose data, but getting a software
raid running again is often arcane, especially with MDRAID and it's
frequent inability to correctly identify a failed drive (sometimes the
fault of the SATA controller mind you). BSDs vinum is little better in
these regards. And god forbid you lose your boot drive and have
forgotten to keep all the boot blocks on your spare properly updated.
you also have to manually intervene and reorder drives in that case,
something hardware raid, any hardware raid, will transparently cover.
There are many people who know how to manage software RAID systems
without unnecessarily losing data. Such people will most likely
have actually used software RAID such as MD with persistent
superblocks and will know that there is nothing to the above FUD
unless they deliberately sabotage the default configurations. It
sounds like you're trying to hand assemble RAIDs from lists of
drive partitions rather than using UUIDs. LILO automatically updates
mirrors. With the current Grub one should grub-install each mirror.
I'd love to see any documentation on any of this. The problems I've
described are real world. Despite that we still have a lot of MD software
mirrors in production as as long as they're working they're cheaper. They
take a lot more effort to make work right though. We use persistent
superblocks, and it doesn't alleviate any of these issues. The
installations are of about four major 'flavors'. RedHat9, FC3, Debian 3.0
and 3.1, and Debian 4.0. And none are immune to the issues. Debian 3 was
pretty bad sometimes not making it past the initrd when a drive failed.
The MD setups were all done using the normal TUI (anaconda, debian's system
installer) tools during installation.
And grub-installing on both drives isn't as simple as it sounds, because it
only works right if their geometry matches. the grub installer isn't smart
enough to figure out if things don't match. typing grub-install to install
a boot block on hd1 (sdb, hdb, whatever it really is) won't necessarily
give you a bootable hd1, because if your grub config references hd0
partitions, and they're different than hd1 in some way, it won't make it to
stage 2/2.5, and thus no command prompt. This atleast has always been my
experience, even with etch.
The other issue is that no bios i know of will handle if the boot drive
fails in some way that doesn't leave it simply not showing up. and most of
the time they tend to fail in ways that leave them showing up to the bios,
but are actually unusable. a hardware raid solves this.
A related issue to that is the fact that most PC BIOS' have a pretty sad
serial console support. This means that failures will more often require
onsite visits if a reboot (for whatever reason) happens after a boot drive
failure but before you can get a tech on site. This is a definite issue if
you're deploying systems in locations remote to your own, or with difficult
Software RAID has caveats, it's not perfect. Hardware RAID has caveats,
it's not perfect. Having seen far more issues in the real world with
software RAIDs than with hardware RAIDs puts me pretty squarely in the
hardware RAID camp.
Software RAID is undoubtedly cheaper for initial investment cost. But in
our experience (Modwest) can cost significantly more when it fails due to
undetected errors, poor error recovery behavior (sometimes not the fault of
software RAID, many IDE, SATA, and even some SCSI drivers and controllers
just do not behave very well when a drive isn't responding properly). It
requires significantly more experience and know-how to properly manage and
recover from an error. Hardware RAID boils down to 'which drive failed?'
'replace with same or larger drive' and you're done. This can be done by
someone with only minimal experience and no unix experience at all. Where
hardware RAID does lose is cost and flexibility. You do have less choice
as to exactly how to manage/maintain your data with hardware RAID. Many
people do not need that much flexibility.
I am not FUDing as you put it, I am making known my objections and
experience with software RAID. Many people don't have any issues with
software RAID. And a software RAID compared to a cheap bottom barrel
hardware RAID will usually be faster especially when you want to do RAID5
and RAID6, and possibly more reliable, and will certainly have more bells