On Mon, 04 Jan 2010 20:34:09 +0800, Thomas Goirand <thomas@goirand.fr> wrote: > Ross Halliday wrote: > > Aside from any bugs that checkarray > > function is definitely a pain on a production system. I have this same problem with the Lenny kernels on certain machines. I have not been able to identify anything specific that is identical on the machines where this happens yet. Essentially, on these systems, the monthly raid check requires a reboot as the drive subsystem becomes so blocked that the load goes over 500 and the raid resync never completes. I can wait for days for it and it wont finish. If I reboot the system and sync the raid arrays before anything starts to use that particular partition, then everything works fine. On these systems I disable the monthly raid check, its not the right solution obviously, but it sucks to wake up on Sunday morning to find multiple outages due to this scheduled raid check. > Well, it's even more a pain to have no monthly check at all, and have > your drive silently die without a warning. Also, my findings is that > most of the time, such lock-up happens only on certain kind of > controllers, or with defective (half working) HDD. I agree silent drive death is bad, but in a raid mirror setup, if one of the drives dies, wont you be fine? I am pretty certain its not a particular type of controller, because I have a number of duplicate hardware machines, some have this problem, some do not. The 'half working' HDD was my theory as well, but smart tests, badblocks doesn't seem to do anything. m
Attachment:
pgpSoceI2bT_d.pgp
Description: PGP signature