[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

md (raid1) problem on sparc Ultra1



I can reproduce this problem systematically on this machine with woody
and stock kernel 2.4.18.

Here I have a Ultra1 with 4 disks. On two of them I built a raid1.
Whenever I reboot after that disks are in sync, a data access exception
is issued on mounting/fscking. I'm using an ext3 fs on /dev/md0.
The problem disappears when disks are not in sync - such as after
that kind of failure. When disks complete their sync operation the
raid array works until next reboot.
So, currently I cannot reboot/halt regularly the computer; my trick
is to stop it (sending break on terminal) and boot at Openboot prompt.


maya:~# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sdc1[0]
      8886912 blocks [2/2] [UU]
      
unused devices: <none>
maya:~# mount /dev/md0 /data
data_access_exception: SFSR[0000000000801009] SFAR[fffff80026e6a9fc],
going.
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
mount(243): Dax
TSTATE: 0000004411009602 TPC: 0000000002010c9c TNPC: 0000000002010ca0 Y:
07000000    Not tainted
g0: d205c008d6024000 g1: fffff800013e7818 g2: 0000000000000001 g3:
fffffffffffffff2
g4: fffff80000000000 g5: fffee00025a83200 g6: fffff800358d4000 g7:
0000000000000014
o0: 0000000000000002 o1: fffff800013e781c o2: fffff80037412cc0 o3:
fffff800010bc880
o4: 0000000000000002 o5: 0000000000000001 sp: fffff800358d6c61 ret_pc:
fffff800013e7820
l0: fffff800013e7800 l1: fffff800013e7c48 l2: 000000000062e000 l3:
0000000000000
002
l4: 0000000000000000 l5: 0000000000626fe0 l6: 00000000effffc18 l7:
0000000000000
000
i0: 000000000000000e i1: fffee00025a831dc i2: 0000000000000000 i3:
0000000000000
000
i4: 0000000000000002 i5: fffff800013e7800 i6: fffff800358d6d21 i7:
0000000002010
e80
Caller[0000000002010e80]
Caller[00000000020003c0]
Caller[00000000004e7a10]
Caller[00000000004e7b00]
Caller[00000000004e7cd0]
Caller[000000000046787c]
Caller[0000000000493790]
Caller[000000000046afa4]
Caller[000000000046b5b0]
Caller[000000000047e8b0]
Caller[000000000047ebe8]
Caller[000000000042de48]
Caller[0000000000410af4]
Caller[000000000001288c]
Instruction DUMP: f84763d8  b2067fdc  84073fff <c606400f> 80a0e000
12480013  b9
38a000  c4064009  80a0a000 
Killed

I can then stop (by sending break) the machine, 
boot it again and mounting can be
done (raid array needs to be resync and this takes about half an hour).
I also swapped the disks, without results.

Any hints?



-- 
Francesco P. Lovergine



Reply to: