[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Replacing failed drive in software RAID



On 10/31/2013 3:41 PM, Bob Proulx wrote:
> Veljko wrote:
>> I'm using four 3TB drives, so I had to use GPT. Although I'm pretty
>> sure I know what I need to do, I want to make sure so I don't loose
>> data. Three drives are dying so I'm gonna replace them one by one.
> 
> Sounds like a good plan to me.  It is what I would do.  It is what I
> have done before when upgrading sizes to larger sizes.
> 
>> This is what I plan to do:
>> Replacing sda
>> ...
>> Did I overlook something? Will this going to work?
> 
> Very well thought out plan!  Looks okay to me.  I like it.  Some boot
> issues to discuss however.
> 
> Is this a BIOS boot ordering boot system booting from sda?  In which
> case replacing sda won't have an MBR to boot from.  You can probably
> use your BIOS boot to select a different disk to boot from.  And then
> after having booted install grub on the other disk.  (Sometimes the
> BIOS boot order will be quite different from the Linux kernel drive
> ordering.)
> 
> I am unfamiliar with the sgdisk backup and load-backup operation.  I
> am not sure that will restore the grub boot sector.  This isn't too
> scary because you can always boot one of the other drives.  Or boot a
> debian-install rescue media.  But after setting up the replacement
> disk it will probably be necessary to install grub upon it in order
> for it to be bootable as the first BIOS boot media.
> 
> And very often I have found that a second disk that I thought should
> have had grub installed upon it did not and when removing sda I find
> that the system won't grub boot from sdb.  Therefore I normally
> restore sda, boot, install grub on sdb, then try again.  But if you
> know ahead of time you can re-install grub on sdb and avoid the
> possible hiccup there.  But if you are concerned about writes to sdb
> then I would simply plan to boot from the debian-installer image in
> rescue mode, assemble the raid, sync, then replace sdb, and repeat.
> You can always install grub to the boot sectors after replacing the
> suspect disks.  Hopefully this makes sense.

This is precisely why I use hardware RAID HBAs for boot disks (and most
often for data disks as well).  The HBA's BIOS makes booting transparent
after drive failure.  In addition you only have one array (hardware)
instead of 3 (mdraid).  You have only 3 partitions to create instead of
9, these residing on top of the one array device, not used to build
multiple software array devices.  So you have one /boot, root fs, and
data, and only one MBR to maintain.  The RAID controller literally turns
your 4 drives into one, unlike soft RAID.

The 4 port Adaptec is cheap, <$200 USD, and a perfect fit for 4 drives:
http://www.adaptec.com/en-us/products/series/6e/
http://www.newegg.com/Product/Product.aspx?Item=N82E16816103229

And because it has 128MB cache you get a small performance boost.

>> I was also thinking about inserting one drive and copying data from
>> RIAD to it so I have backup if something goes wrong. Would that be
>> right thing to do, or that would just load drives unnecessarily and
>> accelerate their failure?
> 
> Are you asking about the one drive inserted being large enough to do a
> full system backup?  If so then I think it is hard to argue against a
> full backup.  I think I would do the full backup even with the extra
> disk activity.  It is read, not write, and so not as bad as normal
> read-write disk activity.

Agreed.

> In which case you might consider that instead of replacing all disks
> one by one that you could simply do a full backup, then create the new
> system with lvm and raid as desired, and then restore the backup onto
> the newly constructed partitions.  After you have the full backup then
> your original drives would be shut off and available as a backup image
> too in that case.  So that also seems a very safe operation.

This is my preferred method.  Cleaner, simpler.  Still not as simple as
moving to hardware RAID though.

> Or since you have four new drives go ahead and construct a new base
> configuration with the four new drives with lvm+raid as desired.  And
> then clone directly from the old system disks to the new system
> disks.  Then boot the new system disks.  This has much more offline
> time than the replace one disk at a time that you outlined above.  I
> normally do the sync one disk at a time since the system is online and
> running services normally during the sync.  But there are many ways to
> accomplish the task.

And yes there is more down time with this method.

-- 
Stan


Reply to: