[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Soft RAID1 and SATA - Hardware failure test - I power off disk and system freezes



On Thu, 15 Dec 2005, Alvin Oga wrote:
> some motherboards does NOT like ( recognize ) tne 2nd disk on the same
> ide cable if the primary disk is offline

SATA != IDE.

> you can also dd if=/dev/zero on the disk ( /dev/hdc ) too and try to see
> if the sw raid ( running on /dev/hda ) rebuilds it for you

Hmm, the raid is in md#, not hd# or sd#.  hd# and sd# are member devices.

Anyway, be VERY careful if you are doing something like this.  It is easy to
forget that RAID does not protect against data corruption (even if it CAN
help you recover from corruption if you know which devices are not
corrupted, etc).  It protects against member device *failures*, and as far
as Linux is concerned, that means the disk reporting errors or failing to
answer, and definately NOT someone writing crap to a member of an array
behind md's back.

To make it even more clear:  if md doesn't notice a device failure, it will
not do what you expect.  Writing to hd#/sd# is *not* a device failure.  And
mucking with the last 128KiB of the array member devices could potentially
make md think that the OTHER device is the one with old stale data, and
cause data loss.  So, don't do it to member devices of a md raid array,
regardless of raid level.

You can safely test resync using:
   mdadm --manage /dev/md# --fail /dev/sd#
   mdadm --manage /dev/md# --stop /dev/sd#
   <do whatever you want to the member device you just removed from the
   array, but end it with mdadm --zero-superblock /dev/sd# if there is any
   chance you mucked with the last 128KiB>
   mdadm --manage /dev/md# --add /dev/sd#

   and watch the resync on /proc/mdstat, when it is done you can stop
   the array and compare the two members, you should find differences only
   on the last 128KiB of the devices (the RAID superblock).

PS: I am not sure where the RAID superblock ends up when you fill a
partition/block device only partially (by using devices of different sizes
in a RAID1 array for example).  It may be in the end of the device itself,
or just after the RAID data area.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Reply to: