[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#66919: marked as done (wierd unkillable apt-get hang)



Your message dated Fri, 14 Jul 2000 16:56:54 -0600 (MDT)
with message-id <[🔎] Pine.LNX.3.96.1000714165601.19031F-100000@wakko.deltatee.com>
and subject line Bug#66919: wierd unkillable apt-get hang
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Darren Benham
(administrator, Debian Bugs database)

--------------------------------------
Received: (at submit) by bugs.debian.org; 8 Jul 2000 12:11:24 +0000
>From dye@c1038623-a.mntp1.il.home.com Sat Jul 08 07:11:24 2000
Return-path: <dye@c1038623-a.mntp1.il.home.com>
Received: from ha1.rdc1.il.home.com (mail.rdc1.il.home.com) [24.2.1.66] (imail)
	by master.debian.org with esmtp (Exim 3.12 2 (Debian))
	id 13AtS8-0005eG-00; Sat, 08 Jul 2000 07:11:24 -0500
Received: from c1038623-a.mntp1.il.home.com ([24.15.102.154])
          by mail.rdc1.il.home.com (InterMail vM.4.01.03.00 201-229-121)
          with ESMTP
          id <20000708121120.RVIU1229.mail.rdc1.il.home.com@c1038623-a.mntp1.il.home.com>
          for <submit@bugs.debian.org>; Sat, 8 Jul 2000 05:11:20 -0700
Received: from c1038623-a.mntp1.il.home.com (dye@localhost [127.0.0.1])
	by c1038623-a.mntp1.il.home.com (8.9.3/8.9.3/Debian/GNU) with ESMTP id HAA08965
	for <submit@bugs.debian.org>; Sat, 8 Jul 2000 07:11:17 -0500
Message-Id: <[🔎] 200007081211.HAA08965@c1038623-a.mntp1.il.home.com>
X-Mailer: exmh version 2.0.2 2/24/98 (debian) 
From: dye@stg.com (Ken R. Dye)
Reply-To: dye@stg.com
Errors-To: dye@stg.com
To: submit@bugs.debian.org
X-URL: http://www.geocities.com/MotorCity/Track/8746
Subject: wierd unkillable apt-get hang
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Date: Sat, 08 Jul 2000 07:11:17 -0400
Sender: dye@c1038623-a.mntp1.il.home.com
Delivered-To: submit@bugs.debian.org


Package: apt
Version: 0.3.10slink

here is apt-get -v
apt 0.3.11 for i386 compiled on Aug  8 1999  10:12:36

dye@TRANSAM: 20% uname -a
Linux transam 2.0.36 #5 Thu Jan 20 04:47:43 CST 2000 i586 unknown

Sorry for the poor data on this bug; I thought it was reproducible (it wa=
s,
several different attempts) but a second FSCK seemed to clean up the prob=
lem.

=46rom a remote system via telnet, was running dselect to install a singl=
e
package, zip.  When I performed the <install> operation, the telnet sessi=
on
hung.  Further attempts to telnet to the machine failed, as well as all
ftp, www, smtp attempts.  Ping however, succeeded.

When I got home, was greeted with a blank screen; virtual terminals did
not work.  =


A power cycle brought up an FSCK that required a manual FSCK which reveal=
ed
filesystem damage, including /var/cache/apt inodes, and if I remember
correctly, /var/lib/dpkg/cache stuff.

I tried to continue the dselect process, but it hung the first thing I
tried, which was install, I believe.  The machine did not seize up as
before, but top revealed that apt-get -update was taking 98% of the
CPU time.

I let this run for several minutes, then tried "strace" on the apt-get
pid to try to see what it was hung on.  No output from strace, so no
system calls were being made; it was in a tight infinite loop.

Tried killing the apt-get pid, all attempts from -3, -15 and even -9 were=

unsuccessful and yielded no strace output either.  Tried =

"gdb /usr/bin/apt-get pid#", which worked but hung as well.

A reboot of the system again forced a manual FSCK, which brought up
more cache directory errors.   I tried another dselect (had to remove a l=
ock =

file) which ran apt-get which again hung, so as I thought it was reproduc=
ible =

and getting kind of late, I went to bed and left the problem alone for a =

couple
of days.  I searched your bug list and found nothing of this sort, starte=
d
to file it and get more info by reproducing the problem, but a power fail=
ure
or stupid-brother-in-law reboot seems to have cleared up the problem.

Could you get the package-maintainer to look at this?  My guess is the
cache-scanning code is under interrupt lock (though I don't know how kill=

-9 didn't work) and getting hung by a corrupted cache.

I consider it a huge bug that the process did not respond to interrupts;
I realize that you don't want to leave a FUBAR'ed cache laying around but=

think it would be better on more severe interrupts (QUIT or better) to =

mark the cache invalid and exit...

--Ken "dye@stg.com" Dye




---------------------------------------
Received: (at 66919-done) by bugs.debian.org; 14 Jul 2000 22:57:31 +0000
>From jgg@ualberta.ca Fri Jul 14 17:57:31 2000
Return-path: <jgg@ualberta.ca>
Received: from tmmi197-073.telusvelocity.net (wakko.deltatee.com) [209.115.197.73] (mail)
	by master.debian.org with esmtp (Exim 3.12 2 (Debian))
	id 13DEOh-0002fp-00; Fri, 14 Jul 2000 17:57:31 -0500
Received: from localhost (wakko.deltatee.com) [127.0.0.1] (jgg)
	by wakko.deltatee.com with smtp (Exim 2.11 #1)
	id 13DEO6-0005mO-00 (Debian); Fri, 14 Jul 2000 16:56:54 -0600
Date: Fri, 14 Jul 2000 16:56:54 -0600 (MDT)
From: Jason Gunthorpe <jgg@ualberta.ca>
X-Sender: jgg@wakko.deltatee.com
To: dye@stg.com, 66919-done@bugs.debian.org
cc: APT Development Team <deity@lists.debian.org>
Subject: Re: Bug#66919: wierd unkillable apt-get hang
In-Reply-To: <[🔎] 200007081211.HAA08965@c1038623-a.mntp1.il.home.com>
Message-ID: <[🔎] Pine.LNX.3.96.1000714165601.19031F-100000@wakko.deltatee.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Delivered-To: 66919-done@bugs.debian.org



On Sat, 8 Jul 2000, Ken R. Dye wrote:

> Tried killing the apt-get pid, all attempts from -3, -15 and even -9 were
> unsuccessful and yielded no strace output either.  Tried 
> "gdb /usr/bin/apt-get pid#", which worked but hung as well.

Sounds like it was stuck in the 'D' state, kernel problem, possibly faulty
hardwxare. Nothing to do with APT.

> Could you get the package-maintainer to look at this?  My guess is the
> cache-scanning code is under interrupt lock (though I don't know how kill
> -9 didn't work) and getting hung by a corrupted cache.

No.

Jason



Reply to: