On Sun, Jul 13, 2003 at 01:51:23PM +0200, martin f krafft wrote: > Folks, > > Over the past year, I have replaced something around 20 IDE > Harddrives in 5 different computers running Debian because of drive > faults. I know about IDE and that it's "consumer quality" and no > more, but it can't be the case that the failure rate is that high. > > The drives are mostly made by IBM/Hitachi, and they run 24/7, as the > machines in question are either routers, firewalls, or servers. > > Replacing a drive would be a result of symptoms, such as frequent > segmentation faults, corrupt files, and zombie processes. In all > cases, I replaced the drive, transferred the data (mostly without > problems), got the machine back into a running state, then ran > `badblocks -svw` on the disk. And usually, I'd see a number of bad > blocks, usually in excess of 100. > > The other day, I received a replacement drive from Hitachi, plugged > it into a test machine, ran badblocks and verified that there were > no badblocks. I then put the machine into a firewall, sync'd the > data (ext3 filesystems) and was ready to let the computers be and > head off to the lake... when the new firewall kept reporting bad > reloc headers in libraries, APT would stop working, there would be > random single-letter flips in /var/lib/dpkg/available (e.g. swig's > Version field would be labelled "Verrion"), and the system kept > reporting segfaults. I consequently plugged the drive into another > test machine and ran badblocks -- and it found more than 2000 -- on > a drive that had non the day before. > > Just now, I got another replacement from Hitachi (this time it > wasn't a "serviceable used part", but a new drive), and out of the > box, it featured 250 bad blocks. > > My vendor says that bad blocks are normal, and that I should be > running the IBM drive fitness test on the drives to verify their > functionality. Moreover, he says that there are tools to remap bad > blocks. All hard drives have a certain number of defects when new, due to the difficulty of making the platters absolutely perfect. The location of these defects is stored in a table on the drive, and the drive then doesn't use those areas. This is totally transparent, and there's no way badblocks should know about these defects. (Some drives even have a dump of this defect list printed on the label, although not very often these days.) For your vendor to be telling you this as an explanation for what you're experiencing suggests to me that either he doesn't know very much about what he's selling or that he's pulling your plonker. > My understanding was that EIDE does automatic bad sector remapping, > and if badblocks actually finds a bad block, then the drive is > declared dead. Is this not the case? SCSI does this. In addition to the manufacturer's defect list referred to above, SCSI drives have a separate grown defect list used for automatic bad sector remapping. It's possible to dump the contents of this list eg. with scsiinfo, and when the number of grown defects starts increasing you get a chance to replace the drive before the table fills up. I could be wrong, but I don't think EIDE does this. > The reason I am posting this is because I need mental support. I'm > going slightly mad. I've been mad for years, absolutely f**king years, I've been over the edge for yonks. :-) > I seem to be unable to buy non-bad IDE drives, > be they IBM, Maxtor, or Quantum. Thus I spend excessive time on > replacing drives and keeping systems up by brute-force. And when > I look around, there are thousands of consumer machines that run > day-in-day-out without problems. Are they all coming from the same source, or if you get them from different sources, is there a common link in the delivery chain? > It may well be that Windoze has better error handling when the > harddrive's reliability degrades (I don't want to say this is a good > thing). This is almost certainly not true at least as far as FAT32 is concerned. The bad sector won't be marked as bad in the FAT until you run Scandisk, so it'll just go on trying to use it and crashing on the resultant errors. > It may be that IDE hates me. I don't think it's my IDE > controller, since there are 5 different machines involved, and the > chance that all IDE controllers report bad blocks where there aren't > any, but otherwise function fine with respect to detecting the > drives (and not reporting the dreaded dma:intr errors). > > So I call to you and would like to know a couple of things: > > - does anyone else experience this? It is something I associate with secondhand drives that may not necessarily have been handled with due care. > - does anyone know why this is happening? > - why is this happening to me? I like to festoon my hard drives with fans (run off 5V instead of 12V to keep the noise down) as they can object to the temperatures they can heat themselves up to - but I think we can rule out heat in the case of your brand new Hitachi drive which is knackered as soon as you start it up. I think it is also unlikely to be static damage accidentally inflicted by you - that might well cause various drive errors, but not of the increasing-number-of-bad-blocks variety. If the electronics were sufficiently screwed as to send commands to the mechanism that would cause it to damage itself, it's unlikely the drive would do anything remotely sensible at all. And they're pretty well sealed against environmental contamination. My HDs are subjected not only to cigarette smoke but to the extremely fine dust with which pigeons maintain the condition of their feathers, and it doesn't seem to bother them. Two possibilities which occur to me are: Dirty mains - maybe you could try running a machine in a different part of town. I think it unlikely that this would selectively affect your HDs though. Transit damage - maybe your vendor's warehouse staff tend to sling boxes around without thought for their contents. Or if you get them delivered, maybe the carrier's staff are careless. > - is it true that bad blocks are normal and can be handled > properly? No. Once a drive starts to get bad blocks their number tends to increase exponentially. The only safe thing to do is replace the drive. SCSI drives can remap bad blocks transparently - as long as they 'catch' the dodgy block before it can't be read at all. IMO this feature should be used to enable you to find out that the drive is going dodgy and hopefully replace it before you lose any data, not to enable you to blithely forget that bad blocks exist :-) > - can bad blocks arise from static discharge or impurities? Static discharge IMO is more likely to cause faults other than bad blocks, and as I say particulates don't seem to be a problem. If they've been stored in damp/humid conditions, that could do them a lot of no good. > when > i replace disks, I usually put the new one into the case > loosely and leave the cover open. The disk is not subjected to > any shocks or the like, it's sitting still as a rock, it's just > not affixed. That shouldn't be a problem if it's not actually rattling/buzzing against something. Mounting a piece of vibrating machinery rigidly can actually increase the vibration-induced component of the loads on the bearings. > I will probably never buy IDE again. But before I bash companies > like Hitachi for crap quality control, I would like to make sure > that I am not the one screwing up. If someone's screwing up, it sounds to me like it's someone between Hitachi etc. and you. -- Pigeon Be kind to pigeons Get my GPG key here: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x21C61F7F
Attachment:
pgpawTdQ79SOV.pgp
Description: PGP signature