Re: Replacing failed drive in software RAID

To: debian-user@lists.debian.org
Subject: Re: Replacing failed drive in software RAID
From: Bob Proulx <bob@proulx.com>
Date: Thu, 31 Oct 2013 14:41:01 -0600
Message-id: <[🔎] 20131031204101.GB29561@hysteria.proulx.com>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 52726C0E.9090201@gmail.com>
References: <[🔎] 52726C0E.9090201@gmail.com>

Veljko wrote:
> I'm using four 3TB drives, so I had to use GPT. Although I'm pretty
> sure I know what I need to do, I want to make sure so I don't loose
> data. Three drives are dying so I'm gonna replace them one by one.

Sounds like a good plan to me.  It is what I would do.  It is what I
have done before when upgrading sizes to larger sizes.

> This is what I plan to do:
> Replacing sda
> ...
> Did I overlook something? Will this going to work?

Very well thought out plan!  Looks okay to me.  I like it.  Some boot
issues to discuss however.

Is this a BIOS boot ordering boot system booting from sda?  In which
case replacing sda won't have an MBR to boot from.  You can probably
use your BIOS boot to select a different disk to boot from.  And then
after having booted install grub on the other disk.  (Sometimes the
BIOS boot order will be quite different from the Linux kernel drive
ordering.)

I am unfamiliar with the sgdisk backup and load-backup operation.  I
am not sure that will restore the grub boot sector.  This isn't too
scary because you can always boot one of the other drives.  Or boot a
debian-install rescue media.  But after setting up the replacement
disk it will probably be necessary to install grub upon it in order
for it to be bootable as the first BIOS boot media.

And very often I have found that a second disk that I thought should
have had grub installed upon it did not and when removing sda I find
that the system won't grub boot from sdb.  Therefore I normally
restore sda, boot, install grub on sdb, then try again.  But if you
know ahead of time you can re-install grub on sdb and avoid the
possible hiccup there.  But if you are concerned about writes to sdb
then I would simply plan to boot from the debian-installer image in
rescue mode, assemble the raid, sync, then replace sdb, and repeat.
You can always install grub to the boot sectors after replacing the
suspect disks.  Hopefully this makes sense.

> I was also thinking about inserting one drive and copying data from
> RIAD to it so I have backup if something goes wrong. Would that be
> right thing to do, or that would just load drives unnecessarily and
> accelerate their failure?

Are you asking about the one drive inserted being large enough to do a
full system backup?  If so then I think it is hard to argue against a
full backup.  I think I would do the full backup even with the extra
disk activity.  It is read, not write, and so not as bad as normal
read-write disk activity.

In which case you might consider that instead of replacing all disks
one by one that you could simply do a full backup, then create the new
system with lvm and raid as desired, and then restore the backup onto
the newly constructed partitions.  After you have the full backup then
your original drives would be shut off and available as a backup image
too in that case.  So that also seems a very safe operation.

Or since you have four new drives go ahead and construct a new base
configuration with the four new drives with lvm+raid as desired.  And
then clone directly from the old system disks to the new system
disks.  Then boot the new system disks.  This has much more offline
time than the replace one disk at a time that you outlined above.  I
normally do the sync one disk at a time since the system is online and
running services normally during the sync.  But there are many ways to
accomplish the task.

Bob

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: Replacing failed drive in software RAID
  - From: Stan Hoeppner <stan@hardwarefreak.com>

References:
- Replacing failed drive in software RAID
  - From: Veljko <veljko3@gmail.com>

Prev by Date: Re: alien 32/64 bits
Next by Date: Unknown script command ??
Previous by thread: Replacing failed drive in software RAID
Next by thread: Re: Replacing failed drive in software RAID
Index(es):
- Date
- Thread