[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Losing my mind, RAID1 on Sparc completely broken?

I've been messing with this for days now, and it's driving me insane.
Here are the particulars:

- SunFire V100 w/ two 40GB disks, identically partitioned.
- Kernel 2.4.25 (kernel.org source) compiled with egcs64 from Woody
- Kernel 2.6.3 (kernel.org source) compiled with gcc 3.3.3 from Sid
- raidtools2 1.00.3-5 backported to Woody (also tried the Woody version)
- ext3 is being used in all test cases

Both disks work perfectly fine with linux native partitions and a root
filesystem installed on them.  Both disks also work perfectly fine if
one or the other is the single member of a degraded RAID1 array.

Both of these configurations have been stress-tested (under both
kernels) and all is well after a fair number of reads and writes.

The problem comes in when I make both disks a member of the same array.
(in this case, hda2 and hdc2 are members of md1).  As soon as I sync the
array, I write to it, and it pretty much instantly corrupts.  Unmounting
md1 and running a fsck on it shows a large number of illegal blocks.
md5sums of various binaries on the system are wrong, apt and dpkg's
status files get so horribly corrupted that they segfault or refuse to
run.  All within minutes of even seconds of writing to md device.

Now, this looks like a pretty huge glaring bug that I would have
expected others to run into, but Google hasn't turned up anything yet,
so I'm stumped.  Am I going insane, or is there something horribly wrong
here?  I've tested with several disks in two different SunFire V100
machines, always with the same results.  All works well with one disk,
everything blows up with two.

Any help anyone can be would be VERY much appreciated, as I was supposed
to have this box online several days ago.  <sigh>

... Adam Conrad

P.S. I can't get at the machine right now as I'm at home, and it's not
online, but I'll post relevant configs (fdisk -l, /etc/raidtab, etc)
tomorrow if someone hasn't already replied with "yet, there's a glaring
bug in Sparc's RAID1, here's more about it")

Reply to: