[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian Raid Crash Repair



On Mon, 14 Nov 2005, Siju George wrote:
> On 11/14/05, Alvin Oga <aoga@mail.linux-consulting.com> wrote:
> > On Mon, 14 Nov 2005, Siju George wrote:
> > > I had a mirror o sarge with 2 disks. One of them failed now. I had
> > > given an option for 1 spare disk while configuring Raid. Could some
> > > one please tell me what I should do to Place a new disk and recreate
> > > the mirror?? Should I manually partition the new disk or is there a

Write a boot sector to both hdc and to a floppy or other removable media
just in case.

Add new disk, removing the failed one from the system.

Now, what you do depends on how you want the new disk to be used...

If you want to expand the RAID:

  Boot single user. Kill udevd if necessary.

  Partition new disk. Create a new RAID array in degraded mode using mdadm.

  Move data to new RAID array (creating filesystems and lvm volumes as
  needed, don't forget the swap partition!). Edit data in new RAID array to
  refer to md1 instead of md0 if you are using kernel autorun (AFAIK it is
  non-trivial to get it to go to md0 without booting a live-cd system and
  renumbering the minor device number).

  Remove old *working disk*, store it somewhere as the valuable backup of
  all your data that it is :-)  Add the second "new" disk, boot, make sure
  everything is correct, partition the second "new" disk and hotadd to the
  array.  Rerun LILO.  All done.

If you will use the new extra space in another way (a second RAID array,
perhaps?):

  Partition and hotadd new disk to the array. Rerun LILO. All done.

> > > command that I can run after connecting the disk so that the Raid
> > > Partitions will be created automatically and the rest of the space in
> > > the hard disk be freely available? I would like to place an 80 GB disk
> > > instead of a 40 GB one.
> >
> > - it would be pointless to use a new 80gb disk instead of a 40gb disk
> >         - the other 40gb is sorta wasted and unused

Only if you want to let them go to waste. And maybe the 80gb disks are
cheaper than 40gb ones where he lives?  Anyway, it does not have to be
pointless at all.

> > - if your system crashed:
> >         - why did it crash

Was the swap over RAID1 too? If it was *not*, we have a damn good reason
for the box to crash.

As for the boot, use a proper configured LILO.  It does the right thing for
RAID arrays if your BIOS isn't braindead, and the system will be able to
boot from hdc if hda goes missing in action.

Since LILO *has* the bad brain disease of writing crap to the first sector
of a partition unless told to do its job right, here's what one needs (for
completeness): 
	boot=/dev/<raid device>
	raid-extra-boot=mbr-only
That keeps the LILO crap where it belongs: the MBR, and *only* there.

Other useful hints:

You can re-read a partition table using fdisk like this:

   fdisk <device>
   w
   q

This doesn't change the partition, and forces a reread if the kernel
doesn't have a partition locked for some reason (used as / device,
or mounted, or in an active lvm vg or md device...).

You can duplicate a DOS-like partition table doing this:

   dd if=/dev/<source disk> of=/dev/<dest. disk> bs=512 count=1
   fdisk <dest disk>
   w
   q
   
*IF* you have no extended partitions.  Warning: this duplicates MBR
loaders partially, so reinstall the loader (LILO, grub, etc) on the new
disk after you do this.

Do *NOT* do this if you use a partition table that uses UUIDs.

> Bad sectors on "hda" was the reason for the crash. Now the Server is

Bad sectors in a md component do not cause a crash. md just drops the
component from the active pool of the array.

> Is there a quick way to partition it the sane way as the disk I am
> replacing?? Again I am replacing the 40 GB with the 80 GB one.

See above.

> > - cat /proc/mdstat to see what is doing or not doing
> >         - if its syncing .. leave it alone .. do not power off,
> >         or add new files, unless you like to be on the bleeding
> >         edge and test that the raid stuff is working "right"

If you cannot trust this, you cannot trust the RAID.  Modifying a running md
RAID array while it is syncining *IS* to be safe and work right.  If it
doesn't, your kernel is crap and you cannot trust its md at all... and you
better find it out sooner than later.

AFAIK, the md device will ignore writes past the current resync cursor on
component devices that are being rebuilt (it writes only to the rest of the
components), and write anything behind the resync cursor to all component
devices including the one(s) being rebuilt.

You can test the RAID sync, you know. Just compare the two md component
devices, and ignore errors in the last 128KiB (the md superblock).

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Reply to: