Re: Fileserver Issues
On Sun, Sep 02, 2007 at 04:07:30PM +1000, Nathan O wrote:
[snip trouble re raid5 and kernel panics]
> it spent too long resyncing and all is well. The degraded array is
> mounted and working fine. I erased and created an ext3 partition on
> the suspect drive and data is being copied to it as I type. I don't
> think this is a hardware problem, issues only happen if I add this
> drive to the array and leave it to resync for a couple of minutes.
> What should I try from here short of purchasing new hardware? I've
> included some of the different messages from my kernel log if they are
> of any use.
I have a couple of ideas:
First to the drive itself. Ensure that you have smartmontools and run a
manual long test. Then, since most of a filesystem's blocks are
empty, it wont' put a drive to the same work as syncing it into a raid
array. To simulate this without using the raid5 kernel stuff, I would
run wipe -k on the whole drive. Yes it will take a long time, but it
will thorougly exercise every block of the drive; any errors should show
up in syslog. Then run mke2fs -c on it to do a badblocks scan while
making an ext2 (not 3, you don't need a journal) filesystem. Then run
e2fsck -c -c on it to do a read/write/read test. Finally, run a long
SMART test again. If after all this there are no errors or kernel
panics, you can trust the drive.
Then to the raid5 issue. Over the last day or three, there was a thread
on debian-user about the problems with raid5 itself (not any mention of
kernel bugs). Review it. Then determine if you could switch to raid1
or raid10 +/- LVM. From the error messages you supplied, I'm guessing
that raid5 uses different kernel modules than raid1 and raid0. If there
truely is a bug in the kernel raid5 code, then getting away from it
would seem to be prudent.
As always, I hope you have solid, reliable backups.