[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1055115: bookworm-pu: package prometheus-node-exporter-collectors/0.0~git20230203.6f710f8-1



Package: release.debian.org
Severity: normal
Tags: bookworm
User: release.debian.org@packages.debian.org
Usertags: pu
X-Debbugs-Cc: prometheus-node-exporter-collectors@packages.debian.org
Control: affects -1 + src:prometheus-node-exporter-collectors

[ Reason ]
Since the bookworm upgrade, all hosts with the
prometheus-node-exporter-collectors package install repeatedly hit the
mirrors with spurious apt-update runs. The Debian package
systemd.timer(1) schedule is once on boot and then every 15 minutes
after, which imposes a tremendous load on the mirror system.

It also happens to lock the apt status files which can in turn cause
deadlocks with other programs. There seems to be an (unrelated?)
regression in apt where some (python?) apt program can hang this way
indefinitely. That's tracked separately, e.g.

https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2003851

[ Impact ]
The mirror networks are going to be badly negatively affected by
this. This actually doesn't seem to have been a problem so far,
possibly because deb.debian.org is absorbing everything we can throw
at it and no one is looking at Fastly (or Amazon?), but I suspect this
could become a problem as Prometheus adoption widens.

This also can mean some security upgrades don't get deployed, as the
apt update lockfile gets hung.

[ Tests ]
There's no autopkgtest on this package, but the patches provided here
have been tested in production in about 90 servers for torproject.org
over a few weeks now, fixing the diagnosed issue.

https://gitlab.torproject.org/tpo/tpa/team/-/issues/41355 is our
tracking ticket documenting this.

[ Risks ]
Code is *relatively* simple. It diverges from upstream a little now
because upstream did a little reactoring post bookworm and I didn't
want to make the patch bigger than it absolutely needs to be.

[ Checklist ]
  [x] *all* changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in (old)stable
  [x] the issue is verified as fixed in unstable

[ Changes ]
There are three individual patches, cherry-picked and backported from
upstream. The first one simply stops running apt-update.

The second and third patch add a new metric which keeps track of the
last update timestamp on the apt metadata.

That is important: previously, the script was running apt-update so we
could be pretty sure it was running automatically. But by making this
change, we're *not* running apt-update automatically and assume users
have properly setup something *else* that does.

This was the behaviour in bullseye, for the record, but it's possible
that some users *assume* the wrong (new) behavior was the correct one
and installed the package not knowing they need to deploy their own
APT::Periodic thing.
diff -Nru prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/changelog prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/changelog
--- prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/changelog	2023-02-02 23:57:45.000000000 -0500
+++ prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/changelog	2023-10-31 13:57:52.000000000 -0400
@@ -1,3 +1,10 @@
+prometheus-node-exporter-collectors (0.0~git20230203.6f710f8-1+deb12u1) bookworm; urgency=medium
+
+  * Team upload
+  * Fix deadlock with other apt update runs (Closes: #1028212)
+
+ -- Antoine Beaupré <anarcat@debian.org>  Tue, 31 Oct 2023 13:57:52 -0400
+
 prometheus-node-exporter-collectors (0.0~git20230203.6f710f8-1) unstable; urgency=medium
 
   * New upstream snapshot (Closes: #1030058)
diff -Nru prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-do-not-run-apt-update-or-simulate-apt-dist-upgrade.patch prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-do-not-run-apt-update-or-simulate-apt-dist-upgrade.patch
--- prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-do-not-run-apt-update-or-simulate-apt-dist-upgrade.patch	1969-12-31 19:00:00.000000000 -0500
+++ prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-do-not-run-apt-update-or-simulate-apt-dist-upgrade.patch	2023-10-31 13:57:52.000000000 -0400
@@ -0,0 +1,65 @@
+From 28c179ddfd3d7e0f5bc49b93f924f0dffba5b71d Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
+Date: Fri, 13 Oct 2023 12:29:48 -0400
+Subject: [PATCH] do not run apt update or simulate apt dist-upgrade
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This is causing all sorts of problems. The first of which is that
+we're hitting our poor mirrors every time the script is ran, which, in
+the Debian package configuration, is *every 15 minutes* (!!).
+
+The second is that this locks the cache and makes this script
+needlessly stumble upon a possible regression in APT from Debian
+bookworm and Ubuntu 22.06:
+
+https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2003851
+
+That still has to be confirmed: it's possible that `apt update` can
+hang for a long time, but that shouldn't concern us if we delegate
+this work out of band.
+
+I also do not believe actually performing the `dist-upgrade`
+calculations is necessary to compute the pending upgrades at all. I've
+done work with python-apt for other projects and haven't found that to
+be required: the cache has the necessary information about pending
+upgrades.
+
+Closes: #179
+
+Signed-off-by: Antoine Beaupré <anarcat@debian.org>
+---
+ apt_info.py | 9 ---------
+ 1 file changed, 9 deletions(-)
+
+diff --git a/apt_info.py b/apt_info.py
+index 59f3ad7..81a276b 100755
+--- a/apt_info.py
++++ b/apt_info.py
+@@ -9,7 +9,6 @@
+ 
+ import apt
+ import collections
+-import contextlib
+ import os
+ 
+ _UpgradeInfo = collections.namedtuple("_UpgradeInfo", ["labels", "count"])
+@@ -90,14 +89,6 @@ def _write_reboot_required():
+ def _main():
+     cache = apt.cache.Cache()
+ 
+-    # First of all, attempt to update the index. If we don't have permission
+-    # to do so (or it fails for some reason), it's not the end of the world,
+-    # we'll operate on the old index.
+-    with contextlib.suppress(apt.cache.LockFailedException, apt.cache.FetchFailedException):
+-        cache.update()
+-
+-    cache.open()
+-    cache.upgrade(True)
+     _write_pending_upgrades(cache)
+     _write_held_upgrades(cache)
+     _write_autoremove_pending(cache)
+-- 
+2.39.2
+
diff -Nru prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-report-the-apt-cache-timestamp.patch prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-report-the-apt-cache-timestamp.patch
--- prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-report-the-apt-cache-timestamp.patch	1969-12-31 19:00:00.000000000 -0500
+++ prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-report-the-apt-cache-timestamp.patch	2023-10-31 13:57:52.000000000 -0400
@@ -0,0 +1,52 @@
+From dc6568ef14f49bddef28befef6f5058d38535cbc Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
+Date: Fri, 13 Oct 2023 13:01:32 -0400
+Subject: [PATCH] report the apt cache timestamp
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+We use the `pkgcache.bin` modification time as a heuristic, but there
+might be a better way to do this.
+
+We also silently ignore errors when pulling that file to avoid broken
+systems from breaking the other metrics. I believe this will lead to a
+null metric which is probably what we want anyway.
+
+A possible improvement to this would be to individually inspect the
+`InRelease` files and report the mirror's timestamps as well, but
+that's more involved. It's also not a priority compared to this fix,
+because we want the update timestamp if we remove the `apt update`
+run (in #181).
+
+Closes: #180
+
+Signed-off-by: Antoine Beaupré <anarcat@debian.org>
+Co-authored-by: Ben Kochie <superq@gmail.com>
+---
+ apt_info.py | 9 +++++++++
+ 1 file changed, 9 insertions(+)
+
+Index: b/apt_info.py
+===================================================================
+--- a/apt_info.py	2023-10-23 14:58:09.558722141 -0400
++++ b/apt_info.py	2023-10-23 15:00:07.250721050 -0400
+@@ -77,6 +77,18 @@ def _write_autoremove_pending(cache):
+     print(f"apt_autoremove_pending {len(autoremovable_packages)}")
+ 
+ 
++def _write_cache_timestamps(registry):
++    print('# HELP apt_package_cache_timestamp_seconds Apt update last run time.')
++    print('# TYPE apt_package_cache_timestamp_seconds gauge')
++    try:
++        print(
++            "apt_package_cache_timestamp_seconds %s"
++            % os.stat('/var/cache/apt/pkgcache.bin').st_mtime
++        )
++    except OSError:
++        pass
++
++
+ def _write_reboot_required():
+     print("# HELP node_reboot_required Node reboot is required for software updates.")
+     print("# TYPE node_reboot_required gauge")
diff -Nru prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-use-a-better-heuristic-for-the-apt-update-last-run-t.patch prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-use-a-better-heuristic-for-the-apt-update-last-run-t.patch
--- prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-use-a-better-heuristic-for-the-apt-update-last-run-t.patch	1969-12-31 19:00:00.000000000 -0500
+++ prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/0001-use-a-better-heuristic-for-the-apt-update-last-run-t.patch	2023-10-31 13:57:52.000000000 -0400
@@ -0,0 +1,60 @@
+From feb943f6df8cbc536eaf7fca59a088d59bb39e82 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
+Date: Tue, 17 Oct 2023 10:27:30 -0400
+Subject: [PATCH] use a better heuristic for the apt update last run time
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+As reported in #182, `pkgcache.bin` gets updated on more than just
+`apt update`. In particular, it gets modified when a package is
+installed, upgraded or removed.
+
+So let's fallback on a better heuristic, which is the `APT::Periodic`
+timestamp. This should give us a much more precise and deliberate
+status.
+
+We also fallback to the `lists` directory, which gets updated when new
+mirror lists gets moved into place. This does run the risk of staying
+silently unchanged if there's no change on mirrors but (a) that's
+rather infrequent and (b) we might actually want to warn on that
+anyway.
+
+Signed-off-by: Antoine Beaupré <anarcat@debian.org>
+Closes: #183
+---
+ apt_info.py | 11 ++++++++++-
+ 1 file changed, 10 insertions(+), 1 deletion(-)
+
+Index: b/apt_info.py
+===================================================================
+--- a/apt_info.py	2023-10-23 15:01:19.018720385 -0400
++++ b/apt_info.py	2023-10-23 15:01:46.930720116 -0400
+@@ -8,6 +8,7 @@
+ # Author: Kyle Fazzari <kyrofa@ubuntu.com>
+ 
+ import apt
++import apt_pkg
+ import collections
+ import os
+ 
+@@ -80,10 +81,18 @@ def _write_autoremove_pending(cache):
+ def _write_cache_timestamps(registry):
+     print('# HELP apt_package_cache_timestamp_seconds Apt update last run time.')
+     print('# TYPE apt_package_cache_timestamp_seconds gauge')
++    apt_pkg.init_config()
++    if apt_pkg.config.find_b("APT::Periodic::Update-Package-Lists"):
++        # if we run updates automatically with APT::Periodic, we can
++        # check this timestamp file
++        stamp_file = "/var/lib/apt/periodic/update-success-stamp"
++    else:
++        # if not, let's just fallback on the lists directory
++        stamp_file = '/var/lib/apt/lists'
+     try:
+         print(
+             "apt_package_cache_timestamp_seconds %s"
+-            % os.stat('/var/cache/apt/pkgcache.bin').st_mtime
++            % os.stat(stamp_file).st_mtime
+         )
+     except OSError:
+         pass
diff -Nru prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/series prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/series
--- prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/series	1969-12-31 19:00:00.000000000 -0500
+++ prometheus-node-exporter-collectors-0.0~git20230203.6f710f8/debian/patches/series	2023-10-31 13:57:52.000000000 -0400
@@ -0,0 +1,3 @@
+0001-do-not-run-apt-update-or-simulate-apt-dist-upgrade.patch
+0001-report-the-apt-cache-timestamp.patch
+0001-use-a-better-heuristic-for-the-apt-update-last-run-t.patch

Reply to: