linux software RAID with hot-swap hardware

To: Debian ISP <debian-isp@lists.debian.org>
Subject: linux software RAID with hot-swap hardware
From: Russell Coker <russell@coker.com.au>
Date: Thu, 23 Jan 2003 13:45:07 +0100
Message-id: <200301231345.07111.russell@coker.com.au>
Reply-to: Russell Coker <russell@coker.com.au>

I've written a document on using Linux software RAID with hot-swap SCSI 
hardware.

It's slightly specific to the hardware I use (I wrote it for internal use) but 
can easily be adapted to be more generic.

If someone wants to add it to a HOWTO or something then be my guest, please 
give me appropriate credit.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page

Title: Linux Software RAID and hot-swap SCSI

Basics of Linux Software RAID

The status of a running software RAID in Linux can be obtained from /proc/mdstat, here's a sample:

md1 : active raid1 sdb1[1] sda1[0]
      1999936 blocks [2/2] [UU]

This is for a software RAID (Meta-Disk) named /dev/md1 which is comprised of /dev/sda1 and /dev/sdb1 devices in a RAID-1 (mirroring) setup. The Dell 1U server machines will have all their disks in software RAID-1 arrays.

When you have a disk installed and recognised by Linux you can then add partitions to degraded RAID arrays at any time with the raidhotadd command. Here is the data you see in /proc/mdstat for a degraded RAID array:

md1 : active raid1 sdb1[1](F) sda1[0]
      1999936 blocks [2/1] [U_]

Device /dev/sdb1 has failed (to generate this error I unplugged the disk /dev/sdb). Now I have just swapped the hard drive (see below) and want to put the new drive back in the array. Firstly I must remove the record for the disk in the failed (F) state with the command raidhotremove /dev/md1 /dev/sdb1, which gives the following state in /proc/mdstat:

md1 : active raid1 sda1[0]
      1999936 blocks [2/1] [U_]

Once this has been done for every partition that was in a RAID array the drive will be regarded as being unsed by Linux which allows it to be repartitioned or unregistered (see below for details of hardware recognition).

If you want to instruct the software RAID driver to stop using a partition on a functional disk then you would use the raidsetfaulty command, EG: raidsetfaulty /dev/md1 /dev/sdb1 to set the partition in failed state so that you can then use raidhotremove to remove it.

When you have a new partition you want to add to add to a RAID set you can use the command raidhotadd ARRAY DEVICE to add it, EG raidhotadd /dev/md1 /dev/sdb1 which results in the following data in /proc/mdstat:

md1 : active raid1 sdb1[2] sda1[0]
      1999936 blocks [2/1] [U_]
      [=======>.............]  recovery = 37.8% (755976/1999936) finish=0.4min speed=48732K/sec

Note that when a device name is followed by [2] then it's in a reconstruction state.
When running raidhotadd commands there is no need to wait for one command to finish before running the next, the kernel maintains a queue of devices to reconstruct. You can schedule several RAID partitions to reconstruct and then go for a coffee break (or a lunch break depending on the speed of the drives).

Hardware recognition

The drives are hot-swap, so you can unplug one disk at any time. However you must inform Linux that you have removed a disk before you can have the new disk recognised. Before you can do this you have to make sure that the disk is no longer recognised as "in-use" by Linux, see the above section for information on the raidsetfaulty and raidhotremove commands.

Once the disk is no longer in use and it is unplugged (the two operations of making it unused and unplugging it can proceed in any order) then you can inform the SCSI driver of the removal with the command scsiadd -r ID. ID is the identity of the disk you want to remove, which is determined by which bay the drive is in. The bays are numbered 0, 1, and 2 from left to right (and the numbers are printed on top of the case - where you can't see them when it's mounted).

So if you want to swap the second disk which Linux usually (but not always) knows as sdb then you use the command scsiadd -r 1 to inform the Linux SCSI driver that the disk is removed, then you can insert the new drive (or re-insert the drive you just removed) and use the command scsiadd -a ID (which in this example is scsiadd -a 1 to make Linux recognise the new disk. After that time you are free to partition the disk and add it to software RAID ready for use.

If you see the error message parity error detected then after waiting for a minute or two (and seeing many other error messages) the error should be corrected and the drive should be recognised. It is best to leave the drive physically in place for some time before running the scsiadd -a command to reduce the risk of this error. Sometimes this error can only be solved with a hardware reset...

Booting

To make a RAID-1 device bootable you first have to use fdisk to set the bootable flag on both the partitions for the root file system (if one disk is removed you want the other disk to be bootable).

Then you have to to configure LILO with the root=/dev/md1 and boot=/dev/md1 lines to configure the root file system as the boot device (NB if you use a RAID device other than /dev/md1 for the root file system then adjust the LILO configuration accordingly. The LILO configuration is in /etc/lilo.conf, to apply the changes run the lilo command with no parameters.

Finally you have to use the install-mbr command to set up a boot block that the BIOS can run to load the LILO block, use the following commands:

install-mbr /dev/sda
install-mbr /dev/sdb

This installs the Debian MBR on both hard drives so that whichever drive is removed the other has a boot loader that can then load LILO to boot Linux.

Installation

I haven't written documentation on setting this up from scratch as it's too difficult and painful. The best way to install new machines is by taking a hard drive from a machine that's already installed (it's hot-swap so this won't interrupt service).

If a new software RAID installation has to be done then get Paul or Russell to do it.

Reply to:

Follow-Ups:
- Re: linux software RAID with hot-swap hardware
  - From: Pierfrancesco Caci <ik5pvx@penny.ik5pvx.ampr.org>

Prev by Date: Re: 100% Debian based Cyber-Center in Strasbourg
Next by Date: Re: linux software RAID with hot-swap hardware
Previous by thread: Re: Limit MySQL database sizes
Next by thread: Re: linux software RAID with hot-swap hardware
Index(es):
- Date
- Thread