Your message dated Wed, 12 Aug 2015 12:10:05 +0200 with message-id <20150812101004.GA18483@crossbow> and subject line Re: race condition has caused the Debian Bug report #442189, regarding infinite(?) loop during update, possible race condition to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@bugs.debian.org immediately.) -- 442189: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=442189 Debian Bug Tracking System Contact owner@bugs.debian.org with problems
--- Begin Message ---
- To: <submit@bugs.debian.org>
- Subject: infinite(?) loop during update, possible race condition
- From: Vincent McIntyre <Vince.McIntyre@atnf.csiro.au>
- Date: Fri, 14 Sep 2007 08:43:27 +1000 (EST)
- Message-id: <Pine.SOL.4.33.0709140839220.6331-100000@venice.tip.CSIRO.AU>
Package: apt Version: 0.6.46.4-0.1 Severity: important *** Please type your report below this line *** Hi, I could not see an existing report of this problem, e.g. I don't think this is #409336. Background ---------- I have a nightly cron job that installs security updates only, using a short sources list file: deb http://debian-archive.atnf.csiro.au:9999/security/ etch/updates main contrib non-free deb-src http://debian-archive.atnf.csiro.au:9999/security/ etch/updates main contrib non-free I'm using an apt-proxy to cache the packages & spare your mirrors. This system has worked reliably since midway through the woody release. The script is breaking on one of the machines I have that is running etch. I have one other machine that exhibited the same underlying problem though (see below). It is working fine on our sarge machines. The sequence of events in the job is (leaving out various sanity checks): 1. /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only clean 2. /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only update || \ (sleep 5; /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only update) 3. [ if there are any packages to do, filter out certain classes of package such as kernels and database, then continue as follows ] 4. /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only upgrade 5. /usr/bin/apt-get -o=Dir::Etc::SourceList=/etc/apt/sources.list.security-only clean 6. /usr/bin/apt-get update The first 'update' call (step 2) is failing with status of 100. The error messages given are: E: Could not get lock /var/lib/apt/lists/lock - open (11 Resource temporarily unavailable) E: Unable to lock the list directory E: Could not get lock /var/lib/apt/lists/lock - open (11 Resource temporarily unavailable) E: Unable to lock the list directory Analysis -------- The failure above seems to be because of another apt-get process that is running, apparently indefinitely. I have seen one running for at least 8 days. Attaching to one such process with strace (it had been running for 2 days), the process appears to be in an infinite loop: # strace -p 9809 Process 9809 attached - interrupt to quit select(10, [5 6 7 9], [], NULL, {0, 168000}) = 0 (Timeout) stat64("/var/lib/apt/lists/partial/debian-archive:9999_atnf_dists_etch_non-free_binary-i386_Packages.decomp", 0xbfce3fa0) = -1 ENOENT (No such file or directory) select(10, [5 6 7 9], [], NULL, {0, 500000}) = 0 (Timeout) stat64("/var/lib/apt/lists/partial/debian-archive:9999_atnf_dists_etch_non-free_binary-i386_Packages.decomp", 0xbfce3fa0) = -1 ENOENT (No such file or directory) select(10, [5 6 7 9], [], NULL, {0, 500000} <unfinished ...> Process 9809 detached # I was able to get another apt-get process into this state, and saw the same output, but for a different file: # strace -p 30990 Process 30990 attached - interrupt to quit select(10, [5 6 7 9], [], NULL, {0, 12000}) = 0 (Timeout) stat64("/var/lib/apt/lists/partial/debian-archive.atnf.csiro.au:9999_atnf_dists_etch_main_source_Sources.decomp", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 select(10, [5 6 7 9], [], NULL, {0, 500000}) = 0 (Timeout) stat64("/var/lib/apt/lists/partial/debian-archive.atnf.csiro.au:9999_atnf_dists_etch_main_source_Sources.decomp", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 select(10, [5 6 7 9], [], NULL, {0, 500000}) = 0 (Timeout) ... There were some inconsistencies between the normal sources.list and the truncated one shown above (which leads to differently named files in /var/lib/apt/lists and possibly a 'disappearing' file), but these naming inconsistencies also occur on other hosts that don't show this problem, so I don't think the inconsistency is the source of the problem. Following this, I cleaned up and tried to reproduce the problem. * I made the two files consistent wrt the naming of the proxy host (so the files in /var/lib/apt/lists will be consistent too) * killed all the apt-get processes (just kill $pid, not kill -9) * removed all the files in /var/lib/apt/lists and below (but I left the partial/ dir itself) * apt-get clean * apt-get update * apt-get upgrade (nothing to do) * ran the cron job. This time the script executed correctly. I checked other machines running etch. I noticed that there was one other with an apt-get process in the same state (infinite loop on a .decomp file). For a time I thought this might be related to libc6; the ones with the problem all had the latest libc6, while the others (also running etch) had an earlier version (2.3.6.ds1-13 vs 2.3.6.ds1-13etch2). However I first noticed this problem with a process that was started on 20 Aug 2007 and possibly it was occurring as early as 12 Aug 2007. On that host, libc6 was only updated to -13etch2 on 29 Aug 2007. I also wondered if it was related to the recently corrected memory corruption issue (fixed with kernel 2.6.18-5-686). It appears not, after upgrading various machines to that kernel. After waiting and watching for a couple of weeks the problem is persisting on the machine I first where noticed it occurring. The other is not showing it now. Summary ------- I have perhaps included a lot of extraneous information but the issue here seems to be that: - if apt-get cannot find a .decomp file it is looking for, it goes into an infinite loop. - there may be a race condition that causes .decomp files to disappear before apt-get is quite finished with them. Probably I have a subtle misconfiguration here but I can't see it. Any help would be welcome. Kind regards Vince -- Package-specific info: -- apt-config dump -- APT ""; APT::Architecture "i386"; APT::Build-Essential ""; APT::Build-Essential:: "build-essential"; APT::Authentication ""; APT::Authentication::TrustCDROM "true"; Dir "/"; Dir::State "var/lib/apt/"; Dir::State::lists "lists/"; Dir::State::cdroms "cdroms.list"; Dir::State::userstatus "status.user"; Dir::State::status "/var/lib/dpkg/status"; Dir::Cache "var/cache/apt/"; Dir::Cache::archives "archives/"; Dir::Cache::srcpkgcache "srcpkgcache.bin"; Dir::Cache::pkgcache "pkgcache.bin"; Dir::Etc "etc/apt/"; Dir::Etc::sourcelist "sources.list"; Dir::Etc::sourceparts "sources.list.d"; Dir::Etc::vendorlist "vendors.list"; Dir::Etc::vendorparts "vendors.list.d"; Dir::Etc::main "apt.conf"; Dir::Etc::parts "apt.conf.d"; Dir::Etc::preferences "preferences"; Dir::Bin ""; Dir::Bin::methods "/usr/lib/apt/methods"; Dir::Bin::dpkg "/usr/bin/dpkg"; DPkg ""; DPkg::Pre-Install-Pkgs ""; DPkg::Pre-Install-Pkgs:: "/usr/sbin/dpkg-preconfigure --apt || true"; -- /etc/apt/preferences -- Explanation: Low priority to avoid installation unless explicitly required Package: * Pin: release a=etch-backports Pin-Priority: 200 -- /etc/apt/sources.list -- # # deb http://debian-archive.atnf.csiro.au:9999/debian/ etch main contrib non-free deb-src http://debian-archive.atnf.csiro.au:9999/debian/ etch main contrib non-free deb http://debian-archive.atnf.csiro.au:9999/security etch/updates main contrib non-free deb-src http://debian-archive.atnf.csiro.au:9999/security etch/updates main contrib non-free deb http://debian-archive.atnf.csiro.au:9999/atnf etch main contrib non-free deb-src http://debian-archive.atnf.csiro.au:9999/atnf etch main contrib non-free # not for routine use #deb http://debian-archive.atnf.csiro.au:9999/backports etch main contrib non-free #deb-src http://debian-archive.atnf.csiro.au:9999/backports etch main contrib non-free -- System Information: Debian Release: 4.0 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.18-4-686 Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8) Versions of packages apt depends on: ii debian-archive-keyring 2007.07.31~etch1 GnuPG archive keys of the Debian a ii libc6 2.3.6.ds1-13etch2 GNU C Library: Shared libraries ii libgcc1 1:4.1.1-21 GCC support library ii libstdc++6 4.1.1-21 The GNU Standard C++ Library v3 apt recommends no packages. -- no debconf information
--- End Message ---
--- Begin Message ---
- To: 442189-done@bugs.debian.org
- Subject: Re: race condition
- From: David Kalnischkies <david@kalnischkies.de>
- Date: Wed, 12 Aug 2015 12:10:05 +0200
- Message-id: <20150812101004.GA18483@crossbow>
- In-reply-to: <cone.1239261975.760693.3361.1000@toolshiner.phx1.kidfixit.com>
- References: <cone.1239261975.760693.3361.1000@toolshiner.phx1.kidfixit.com>
Hi, On Thu, Apr 09, 2009 at 12:26:15AM -0700, Joey Korkames wrote: > As Eugugene stated, it's because of a race condition. He didn't mention which code he suspected, but I suspect he suspected ReadMessages, which is by now fixed to not run in to an endless loop on an unfortunate event of having two messages split at the 64000 charcter boundary – which looked like the straces in this bugreport. > Another related bug, #439031, indicated that multiple gpgv threads run on > the same Releases files (once for a deb, again for deb-src line in > sources.list) causes a similar stall. Which is a report I closed just now as it doesn't happen anymore (and didn't stall, it reported an error instantly…). So, given that I really predict it was ReadMessages, we rewrote the acquire system for apt 1.1 and 6 years of passed time without further comments I declare this bug done, but if this is still reproducible feel free to reopen it! Best regards David KalnischkiesAttachment: signature.asc
Description: Digital signature
--- End Message ---