[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Corruption many (unknown) files; how best to restore?



On Sun, Apr 22, 2001 at 09:37:55AM -0500, Kent West wrote:
> Hi all.
> 
>     HISTORY (the actual question is below):
> 
> For some reason my Sid box at home has been locking up in X lately. I don't
> know if it's an X problem, or a hardware problem, or what. I'm running 
> 2.2.18; have been for months without any problem, so I doubt it's any bug
> within the kernel.
> 
> I've just run memtest86, and it seems that my RAM's fine.
> 
> I had a hard drive fail a couple of months or so ago; I pulled it out and
> replaced it (it had /usr and /home on it) and rebuilt as best I could; I
> think I got it working.
> 
> Then last week I put that failed drive back in to see if I could recover
> any data off of it before consigning it to it's eternal resting place. After
> finding a bay to rest it in and plugging in an IDE data cable, I realized I
> didn't have any extra power plugs for it. So I left it that way until I
> could get a power splitter. I figured that not having any power to it, the
> system would ignore it.

I'm afraid that not having the power cable attached to the drive _may_
have been the initial cause for the lock-ups you're experiencing...

I once created myself a similar problem on a SCSI-based box simply by
attaching a fairly long external SCSI cable to connect my scanner.
Obviously, the cable was too long, as it apparently deformed the
electrical impulses on the SCSI bus in such a way that sector adressing
of the drive happened to continue to work but in a more or less random
fashion! Now, I don't have to elaborate any further on what that means
in terms of file system corruption... Actually, hundreds of files got
corrupted :(
The strange thing was that I didn't get _any_ error messages from the
SCSI subsystem -- after attaching that harmful cable, I was able to
work happily for another quarter of an hour in GIMP before the system
finally completely froze (and interestingly, the scanner did work).
Post-mortem analysis of the corrupted files revealed, that the data I
had written to disk during that quarter of an hour seemed to have
been randomly scattered all over the file system, doing its destructive
job as thoroughly as possible...
I'm sharing that story because my educated guess would be that the freely
floating drive (as to electrical charge) in your case might well have
had affected the signals on the IDE bus in a similar way. Hopefully not as
thorough as in my case ;)
( Also, while we're at the topic, my advice to anyone playing with the
idea of exceeding the maximum recommended cable lengths for whatever
reason: just don't do it! Or have a recent backup :)

> 
> I think maybe my suspicion was incorrect, and that the system saw this drive
> and got confused and started doing nasty things. I shut down and unplugged
> the drive, and restarted the system. Everything looked fine, except that
> KDM no longer started an X session; it acted the same way that it would if
> there was something wrong, like a wrong mouse section, in the XF86Config
> file. But I could start X with the startx command, so I just figured it was
> some glitch I downloaded with my most recent upgrade of Sid.
> 
> Nevertheless, since then I've started having lockups in X. It may be
> related to a Windows-based Backgammon game I'm running via Wine (this game
> typically bombs now, whereas it used to work fine).
> 
> To make this (very) long story short, the repeated crashing and subsequent
> resets (no way to ssh/telnet in, and loss of keyboard control) has tended
> to do nasty things to my file system.
> 
> 
>     ACTUAL QUESTION:
> 
> I don't know which files/packages are corrupt; is there any automated way
> to have the system check to see what's installed, what's broken, and what
> needs to be reinstalled to fix what's broken?

I would check my most recent MD5-checksum filelists against the existing
files to produce a list of what's damaged -- hopefully you did create
MD5 lists while the system was still working properly? ;) That doesn't
solve the automatic reinstallation part, though.
Maybe someone else has a better suggestion.

Anyway, it's a good moment to reconsider installing a tool like tripwire
(www.tripwire.org) -- or if you think that's overkill, run something like
"find / -type f | xargs md5sum >files.md5"  periodically...

Good luck,
Erdmut


-- 
Erdmut Pfeifer
science+computing ag

-- Bugs come in through open windows. Keep Windows shut! --



Reply to: