On Mon, Jul 31, 2006 at 09:41:51PM +0100, Giles McGarry wrote: > Dear all, I've just inherited a debian system. I'm affraid I'm not very experienced with Debian, coming more from a Solaris background so please be patient if the questions are numpty. > > I have a problem at the moment, strangely various binaries in the /bin directory are changing size and becoming corrupt. When I restore the original they work ok, and then at some time later they change size and stop working. I've now restored all of the files (there's about a dozen) into /bin2 which I can use when the ones in /bin get corrupt. The original (and working file in /bin2 is as follows: I am very inexperienced in these things, but from a simple "problem-solving" point of view, I suggest the following... I'd be afraid you've either been rooted, or you have failing hardware that is causing this. Hardware problems should show up elsewhere in the file tree as well though. can you remount the partition read-only? then if you have changes show up, you can see if its been remounted again... sure sign of rooting, IMVHO. > > /bin2/ls -l /bin2/ls > -rwxr-xr-x 1 root root 75948 Jul 31 17:17 /bin2/ls > > but the one in /bin is different ie > > /tmp/ls -al /bin/ls > -rwxr-xr-x 1 root root 80044 Jul 31 21:29 /bin/ls > > As you can see it changed size recently tonight. Looking in /bin all of the following files are also larger than they were and have all changed size at the same time: > > -rwxr-xr-x 1 root root 80044 Jul 31 21:29 vdir > -rwxr-xr-x 1 root root 34456 Jul 31 21:29 touch > -rwxr-xr-x 1 root root 9716 Jul 31 21:29 tempfile > -rwxr-xr-x 1 root root 16312 Jul 31 21:29 sync > -rwxr-xr-x 1 root root 17944 Jul 31 21:29 rmdir > -rwxr-xr-x 1 root root 34808 Jul 31 21:29 rm > -rwxr-xr-x 1 root root 17944 Jul 31 21:29 readlink > -rwxr-xr-x 1 root root 9672 Jul 31 21:29 mktemp > -rwxr-xr-x 1 root root 23276 Jul 31 21:29 mknod > -rwxr-xr-x 1 root root 24984 Jul 31 21:29 mkdir > -rwxr-xr-x 1 root root 80044 Jul 31 21:29 ls > -rwxr-xr-x 1 root root 27192 Jul 31 21:29 ln > -rwxr-xr-x 1 root root 57772 Jul 31 21:29 gzip > -rwxr-xr-x 1 root root 80044 Jul 31 21:29 dir > -rwxr-xr-x 1 root root 35820 Jul 31 21:29 df > -rwxr-xr-x 1 root root 32684 Jul 31 21:29 dd > -rwxr-xr-x 1 root root 55308 Jul 31 21:29 cp > -rwxr-xr-x 1 root root 38668 Jul 31 21:29 chown > -rwxr-xr-x 1 root root 35308 Jul 31 21:29 chmod > > all of them slightly larger than what they should be. When I run the currupt verion of /bin/ls I get the following: what process ran at that time? maybe an automatic fsck that is fsck'ing (heh!) the drive? > > # /bin/ls > Segmentation fault > > I've just written a script to watch the files changing so it restores them, but that's no fix at all I've tried to ascertain why they are changing but cannot get to the bottom of it, sometimes it's actually while I'm on the system. Strangley I also have a copy in /tmp that I've had there all day and that's never been corrupted it has same ownership permissions etc as the one in /bin/ls. have you tried changing ownership/permissions to see if you can narrow down the source of this? Also, using your script above, get a snapshot of what processes are running at the time you see the corruption. I think you probably need to get a really good run of ps outputs so you can find something running at the time of corruption. > > Also I've got various commands hanging around in a ps listing, either supposedly still running or defunct, eg > > root 2142 1 0 Jul27 ? 00:00:00 mv ls.corrupt ls > > From the other day when this occured, and > > root 2143 2142 0 Jul27 ? 00:00:00 [mv] <defunct> > > And I have a few hundred lines like this. I would assume you need to get rid of these guys... do they respond to a kill or kill -9 ? could these be caused due to disk corruption at the time of the mv causing the process to hang around? > > Very strange and I'm pulling my hair out at the m oment trying to figure it out. I've not rebooted the system as I'm remote from it and I don't want to take the chance of it not coming back while I'm not there. you might have to take that trip and get it rebooted, especially if you end up with unkillable processes. > > As I say I have inherited the system and have no real prior knowledge of the box or what our old admin did on there, so any help greatly appreciated. > multi-user setup? can you start locking out users and see if one of them is somehow causing it? hth A
Attachment:
signature.asc
Description: Digital signature