[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Suggestions wanted for torture-testing my hard-drives...



I recently put together a RAID5 array with 3 250GB SATA drives (using the raid456 driver). One partition, 500GB, ext2.

Everything looked like it should, so I started moving a bunch of large files over to the new drive... only to discover, after moving 20-30GB or so, that the filesystem had somehow become read-only. Huh? So, not suspecting anything really wrong, I unmounted it, mounted it back... and was able to continue. Again, it turned read-only after some copying.

Becoming concerned, I unmounted and fsck'd it. It found oodles of problems and lots of files got hosed. Fortunately, many of the files were ones that I got via P2P, so I could get them again. So, with a clean filesystem, I remounted and went on to replacing the hosed files. AGAIN, the filesystem went read-only after a day... particularly after some heavy activity.

I googled the error messages I was seeing in the logs and found someone else who had gotten these messages and someone else replied that they had seen this happen to *their* raid when one of the drives was flaky, but that it only happened when the drives were under a load. Ah ha! Sounds familiar!

So, now, I need to torture-test my drives. It's a new (as in, I've never used it before and I'm not sure it's not the problem) no-name sata controller, a new sata cage, and new drives. I've plugged a known-good 200GB drive into the 4th sata port on the controller. So, I want to, in this order: 1 - Load-test the known-good 200GB to make sure that the controller isn't bad.
2 - Move the data from the raid to the known-good
3 - Load-test the raid to make sure that I can reproduce the problem
4 - Load-test the individual drives in the raid.

My questions for the list are:
1 - Is this a good strategy, or is there something else I should do?
2 - How do you propose that I give the drive(s) a really good workout?
o I could try moving a bunch of files to and fro, but problems might not turn up until I spot certain lines in syslog, or until I unmount and fsck, which is cumbersome (although it is known to produce the problem). o I could use badblocks, but that bypassed the filesystem and, hence, might be gentler on the drive. To compensate, I've thought of starting several non-destructive read/write scans with different instances of badblocks (since you can start each one at a different place in the drive). Or, maybe I should use Bonnie++? o I've found lots of server-loading script sites out there, but I'm hoping that someone on the list can help cull the flock a bit. A lot of the scripts seem to be aimed at loading to see how it affects the *speed* of the system, while I'm looking for how it affects *data-integrity*. For example, something that just does a bunch of random seeks on the drive isn't going to help me, because I want something that makes sure that it is getting the data that it's expecting.

So... suggestions?

- Joe

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


Reply to: