Paul E Condon wrote: > The following is just a few examples from kern.log: > May 8 11:32:49 cmn kernel: [4880283.861051] end_request: I/O error, dev sda, sector 16136192 Ouch! You have a disk that is crying out for help. Oh the pain and suffering of it! > All of them have the same sector number. This is the sda drive, > which is formatted as ext4. Is there some way that the automatic > reallocate could the repaired by a forced manual fsck? and is the > rescue function on the netinst CD adequate for this? I have often been in your same situation. I would ensure that the backup is current and valid and then replace the disk. That is me. I have seen disks get worse very quickly after they have exhibited failures. Modern disk controllers keep internal spares. By the time the disk is showing errors externally the internal spares have probably all been consumed with other failures. Problems like this will quickly make you a believer in RAID. I pretty much raid everything these days just to avoid being in this situation. In a RAID the bad disk would have already been kicked out of the raid array. It would then be left running in degraded mode on the remaining drives. The system would keep running without problems. Replacing the failing drive and backfilling the raid array can all occur while the system is up and online. > Not running SMART. > What Debian package provides smartctl ? apt-get install smartmontools smartctl -l error /dev/sda I expect that to show errors. smartctl -t short /dev/sda sleep 120 smartctl -l selftest /dev/sda I expect that to show errors. > I don't think the following tests will make the reallocation problem > go away. Nope. Seems like a disk failure to me. > I was planning to do something else this weekend, Oh well. RAID. I can't say enough good things about it in these situations. And backup. BTW... I have a low priority machine that is crying right now that SMART selftests are failing. It hasn't gotten to the actual I/O failure error stage yet but it is only a matter of time. It is a low priority machine so I haven't actually done anything yet. It is still up and running. But I have a disk and as soon as I get a few spare minutes this weekend I am going to go swap out the failing disk for another. But tomorrow looks pretty busy for me. I probably won't get to it until Monday. And I have no stress about it because it is a raid and the other disk is healthy. Plus backups are current. Bob
Attachment:
signature.asc
Description: Digital signature