Re: Builds that pass locally but fail on sbuild? (Re: Reviving schroot as used by sbuild)
On Thu, 27 Jun 2024 at 19:56:43 -0700, Otto Kekäläinen wrote:
> Could you point me to some Debian Bug # or otherwise share examples of
> cases when a build succeeded locally but failed on official Debian
> builders due to something that is specific for sbuild/schroot?
I can't easily point you to a Debian bug number, because I try to only
upload packages that live up to Debian's quality standards, which means
I've been routinely building packages for upload in sbuild/schroot for
several years; so if a package fails in that situation, I do not upload,
and retry as many times as it takes to get it right.
(I'm sure I've failed to do that several times, but I'm sorry, I mostly
can't remember specific instances or bug numbers; I generally try to fix
the regression as quickly as I can.)
But, some examples of packages and the reasons they fail:
- bubblewrap, repeatedly. Its test suite wants to create new user
and filesystem namespaces, which is unconditionally not allowed by
the kernel while inside a chroot (because the kernel doesn't want to
allow filesystem namespaces to be used to escape from a chroot). The
relevant tests have to be skipped in situations where they can't work.
"Real" container managers that use pivot_root() instead of chroot(),
such as Docker and Podman, sometimes allow creation of nested user
namespaces (like bwrap by default, and docker --privileged), sometimes
deny it (like bwrap --disable-userns, and Docker by default), and
sometimes cannot allow it because some larger factor forces their hand:
it's non-obvious what will work.
The conditions for not being allowed to create new namespaces are
relatively complicated and poorly-documented, and the error reporting is
minimal (two or three errno values have to cover every possible failure
mode), so this is something that has to be done by trial and error.
Until recently, DSA'd machines all used
/proc/sys/kernel/unprivileged_userns_clone to disable unprivileged
creation of user namespaces anyway. This restriction has presumably
been lifted for the buildds that use sbuild in unshare mode.
- xdg-desktop-portal, repeatedly. Its test suite uses FUSE, which is
disabled (the module is prevented from loading) on official Debian
buildds as a security hardening mechanism, even though on typical
end-user or server Debian systems it works fine.
This is one that I did have to find out via FTBFS, because I don't yet
have a local build environment that replicates this restriction. I know
that I should, and it's on my list.
- ostree, at least once. The test suite historically assumed that /var/tmp
supports extended attributes, which is not true on all buildds (ordinary
on-disk filesystems usually do support them, but tmpfs doesn't or didn't
until recently, and some buildds with plenty of RAM operate in a tmpfs
root filesystem to speed up their builds).
- flatpak, repeatedly. Same as bubblewrap, ostree and x-d-p, combined.
- dbus, historically. For a long time, when using the non-default
DBUS_COOKIE_SHA1 authentication mechanism, libdbus ignored $HOME and
instead used the "official" home directory from /etc/passwd
(the equivalent of `getent passwd $(id -u) | cut -d: -f6`). Official
buildds set the user's home directory to /nonexistent, so this fails.
In production use, dbus normally uses EXTERNAL over AF_UNIX (and doesn't
even allow DBUS_COOKIE_SHA1, as a piece of security hardening), but in
its build-time tests it specifically exercises each auth mechanism and
each transport, including DBUS_COOKIE_SHA1 over TCP (which is a
terrible idea on Unix but is unfortunately necessary on Windows).
- GLib, ongoing (#972151). When the GLib test suite tests interoperability
with libdbus, it (IMO reasonably!) expects ("localhost", AF_INET) to
resolve to 127.0.0.1, but that doesn't work on IPv6-only buildds for
relatively complicated reasons involving subtleties of glibc resolver
behaviour (#952740). My local build environment still doesn't have code
to reproduce this, and I'm sorry that I haven't provided workarounds or
fixes in the GLib test suite or in libdbus' discouraged TCP code paths.
If someone wants to work on this, skipping the interop tests for TCP on
IPV6-only buildds would probably be more proportionate than adjusting
libdbus' name-resolution behaviour for a feature nobody should be
using in production anyway.
- Any package that assumes that if $XDG_RUNTIME_DIR is set, then it is
set to a usable value (because historically schroot would set it to
a value that exists/works on the host system, but does not exist and
cannot be created inside the container). This is worked around by
individual packages unsetting XDG_RUNTIME_DIR or setting it to a more
useful value, or automatically by recent debhelper in a sufficiently
high compat level (#942111).
> I have never run in such a situation despite doing Debian packaging
> for 10 years with fairly complex C++ software targeting all archs
> Debian supports.
If your complex C++ software is doing pure computation without
side-effects, or if it's doing something that's unaffected by being in
a chroot (like file I/O to the build directory, or IPC via AF_UNIX)
then it can be extremely complex and still not hit this sort of thing.
Conversely, container-adjacent tools that want to run build-time tests
will hit this sort of thing every time.
smcv
Reply to: