[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#442189: infinite(?) loop during update, possible race condition



Package: apt
Version: 0.6.46.4-0.1
Severity: important

*** Please type your report below this line ***
Hi,

I could not see an existing report of this problem, e.g. I don't think
this is #409336.

Background
----------
I have a nightly cron job that installs security updates only, using a
short sources list file:
  deb     http://debian-archive.atnf.csiro.au:9999/security/  etch/updates main contrib non-free
  deb-src http://debian-archive.atnf.csiro.au:9999/security/  etch/updates main contrib non-free
I'm using an apt-proxy to cache the packages & spare your mirrors.
This system has worked reliably since midway through the woody release.

The script is breaking on one of the machines I have that is running etch.
I have one other machine that exhibited the same underlying problem though
(see below). It is working fine on our sarge machines.


The sequence of events in the job is (leaving out various sanity checks):
 1.  /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only clean
 2.  /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only update || \
     (sleep 5; /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only update)
 3.  [ if there are any packages to do, filter out certain classes of package
      such as kernels and database, then continue as follows ]
 4. /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only upgrade
 5. /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only clean
 6. /usr/bin/apt-get update


The first 'update' call (step 2) is failing with status of 100.
The error messages given are:
  E: Could not get lock /var/lib/apt/lists/lock - open (11 Resource temporarily unavailable)
  E: Unable to lock the list directory
  E: Could not get lock /var/lib/apt/lists/lock - open (11 Resource temporarily unavailable)
  E: Unable to lock the list directory


Analysis
--------

The failure above seems to be because of another apt-get process that is
running, apparently indefinitely. I have seen one running for at least 8 days.
Attaching to one such process with strace (it had been running for 2 days),
the process appears to be in an infinite loop:
  # strace -p 9809
  Process 9809 attached - interrupt to quit
  select(10, [5 6 7 9], [], NULL, {0, 168000}) = 0 (Timeout)
  stat64("/var/lib/apt/lists/partial/debian-archive:9999_atnf_dists_etch_non-free_binary-i386_Packages.decomp", 0xbfce3fa0) = -1 ENOENT (No such file or directory)
  select(10, [5 6 7 9], [], NULL, {0, 500000}) = 0 (Timeout)
  stat64("/var/lib/apt/lists/partial/debian-archive:9999_atnf_dists_etch_non-free_binary-i386_Packages.decomp", 0xbfce3fa0) = -1 ENOENT (No such file or directory)
  select(10, [5 6 7 9], [], NULL, {0, 500000} <unfinished ...>
  Process 9809 detached
  #

I was able to get another apt-get process into this state, and saw the same
output, but for a different file:
  # strace -p 30990
  Process 30990 attached - interrupt to quit
  select(10, [5 6 7 9], [], NULL, {0, 12000}) = 0 (Timeout)
  stat64("/var/lib/apt/lists/partial/debian-archive.atnf.csiro.au:9999_atnf_dists_etch_main_source_Sources.decomp", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  select(10, [5 6 7 9], [], NULL, {0, 500000}) = 0 (Timeout)
  stat64("/var/lib/apt/lists/partial/debian-archive.atnf.csiro.au:9999_atnf_dists_etch_main_source_Sources.decomp", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
select(10, [5 6 7 9], [], NULL, {0, 500000}) = 0 (Timeout)
...

There were some inconsistencies between the normal sources.list and the
truncated one shown above (which leads to differently named files in
/var/lib/apt/lists and possibly  a 'disappearing' file), but these naming
inconsistencies also occur on other hosts that don't show this problem, so
I don't think the inconsistency is the source of the problem.

Following this, I cleaned up and tried to reproduce the problem.
 * I made the two files consistent wrt the naming of the proxy host
   (so the files in /var/lib/apt/lists will be consistent too)
 * killed all the apt-get processes
   (just kill $pid, not kill -9)
 * removed all the files in /var/lib/apt/lists and below
   (but I left the partial/ dir itself)
 * apt-get clean
 * apt-get update
 * apt-get upgrade (nothing to do)
 * ran the cron job.
This time the script executed correctly.

I checked other machines running etch. I noticed that there was one
other with an apt-get process in the same state (infinite loop on a
.decomp file).

For a time I thought this might be related to libc6; the ones with the
problem all had the latest libc6, while the others (also running etch)
had an earlier version (2.3.6.ds1-13 vs 2.3.6.ds1-13etch2).
However I first noticed this problem with a process that was started on
20 Aug 2007 and possibly it was occurring as early as 12 Aug 2007.
On that host, libc6 was only updated to -13etch2 on 29 Aug 2007.

I also wondered if it was related to the recently corrected memory
corruption issue (fixed with kernel 2.6.18-5-686). It appears not,
after upgrading various machines to that kernel.

After waiting and watching for a couple of weeks the problem is persisting
on the machine I first where noticed it occurring. The other is not showing
it now.

Summary
-------

I have perhaps included a lot of extraneous information but the issue here
seems to be that:
  - if apt-get cannot find a .decomp file it is looking for,
    it goes into an infinite loop.
  - there may be a race condition that causes .decomp files to disappear
    before apt-get is quite finished with them.

Probably I have a subtle misconfiguration here but I can't see it.
Any help would be welcome.


Kind regards
Vince


-- Package-specific info:

-- apt-config dump --

APT "";
APT::Architecture "i386";
APT::Build-Essential "";
APT::Build-Essential:: "build-essential";
APT::Authentication "";
APT::Authentication::TrustCDROM "true";
Dir "/";
Dir::State "var/lib/apt/";
Dir::State::lists "lists/";
Dir::State::cdroms "cdroms.list";
Dir::State::userstatus "status.user";
Dir::State::status "/var/lib/dpkg/status";
Dir::Cache "var/cache/apt/";
Dir::Cache::archives "archives/";
Dir::Cache::srcpkgcache "srcpkgcache.bin";
Dir::Cache::pkgcache "pkgcache.bin";
Dir::Etc "etc/apt/";
Dir::Etc::sourcelist "sources.list";
Dir::Etc::sourceparts "sources.list.d";
Dir::Etc::vendorlist "vendors.list";
Dir::Etc::vendorparts "vendors.list.d";
Dir::Etc::main "apt.conf";
Dir::Etc::parts "apt.conf.d";
Dir::Etc::preferences "preferences";
Dir::Bin "";
Dir::Bin::methods "/usr/lib/apt/methods";
Dir::Bin::dpkg "/usr/bin/dpkg";
DPkg "";
DPkg::Pre-Install-Pkgs "";
DPkg::Pre-Install-Pkgs:: "/usr/sbin/dpkg-preconfigure --apt || true";

-- /etc/apt/preferences --

Explanation: Low priority to avoid installation unless explicitly required
Package: *
Pin: release a=etch-backports
Pin-Priority: 200

-- /etc/apt/sources.list --

#
#
deb     http://debian-archive.atnf.csiro.au:9999/debian/    etch main contrib non-free
deb-src http://debian-archive.atnf.csiro.au:9999/debian/    etch main contrib non-free

deb     http://debian-archive.atnf.csiro.au:9999/security   etch/updates main contrib non-free
deb-src http://debian-archive.atnf.csiro.au:9999/security   etch/updates main contrib non-free

deb     http://debian-archive.atnf.csiro.au:9999/atnf       etch main contrib non-free
deb-src http://debian-archive.atnf.csiro.au:9999/atnf       etch main contrib non-free

# not for routine use
#deb     http://debian-archive.atnf.csiro.au:9999/backports etch main contrib non-free
#deb-src http://debian-archive.atnf.csiro.au:9999/backports etch main contrib non-free


-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-4-686
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)

Versions of packages apt depends on:
ii  debian-archive-keyring 2007.07.31~etch1  GnuPG archive keys of the Debian a
ii  libc6                  2.3.6.ds1-13etch2 GNU C Library: Shared libraries
ii  libgcc1                1:4.1.1-21        GCC support library
ii  libstdc++6             4.1.1-21          The GNU Standard C++ Library v3

apt recommends no packages.

-- no debconf information





Reply to: