[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#66919: wierd unkillable apt-get hang



Package: apt
Version: 0.3.10slink

here is apt-get -v
apt 0.3.11 for i386 compiled on Aug  8 1999  10:12:36

dye@TRANSAM: 20% uname -a
Linux transam 2.0.36 #5 Thu Jan 20 04:47:43 CST 2000 i586 unknown

Sorry for the poor data on this bug; I thought it was reproducible (it was,
several different attempts) but a second FSCK seemed to clean up the problem.

From a remote system via telnet, was running dselect to install a single
package, zip.  When I performed the <install> operation, the telnet session
hung.  Further attempts to telnet to the machine failed, as well as all
ftp, www, smtp attempts.  Ping however, succeeded.

When I got home, was greeted with a blank screen; virtual terminals did
not work.  

A power cycle brought up an FSCK that required a manual FSCK which revealed
filesystem damage, including /var/cache/apt inodes, and if I remember
correctly, /var/lib/dpkg/cache stuff.

I tried to continue the dselect process, but it hung the first thing I
tried, which was install, I believe.  The machine did not seize up as
before, but top revealed that apt-get -update was taking 98% of the
CPU time.

I let this run for several minutes, then tried "strace" on the apt-get
pid to try to see what it was hung on.  No output from strace, so no
system calls were being made; it was in a tight infinite loop.

Tried killing the apt-get pid, all attempts from -3, -15 and even -9 were
unsuccessful and yielded no strace output either.  Tried 
"gdb /usr/bin/apt-get pid#", which worked but hung as well.

A reboot of the system again forced a manual FSCK, which brought up
more cache directory errors.   I tried another dselect (had to remove a lock 
file) which ran apt-get which again hung, so as I thought it was reproducible 
and getting kind of late, I went to bed and left the problem alone for a 
couple
of days.  I searched your bug list and found nothing of this sort, started
to file it and get more info by reproducing the problem, but a power failure
or stupid-brother-in-law reboot seems to have cleared up the problem.

Could you get the package-maintainer to look at this?  My guess is the
cache-scanning code is under interrupt lock (though I don't know how kill
-9 didn't work) and getting hung by a corrupted cache.

I consider it a huge bug that the process did not respond to interrupts;
I realize that you don't want to leave a FUBAR'ed cache laying around but
think it would be better on more severe interrupts (QUIT or better) to 
mark the cache invalid and exit...

--Ken "dye@stg.com" Dye






Reply to: