Re: Bug#985617: glibc: flaky autopkgtest on most architectures

To: Paul Gevers <elbrus@debian.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>, 985617@bugs.debian.org, Debian CI team <debian-ci@lists.debian.org>
Subject: Re: Bug#985617: glibc: flaky autopkgtest on most architectures
From: Simon McVittie <smcv@debian.org>
Date: Sun, 25 Apr 2021 10:39:15 +0100
Message-id: <[🔎] YIU4w75OQ0bX0siQ@momentum.pseudorandom.co.uk>
In-reply-to: <[🔎] YIUzApuh8wYy5ZNt@momentum.pseudorandom.co.uk>
References: <69ba0085-8c72-a856-0b8c-a9b443343045@debian.org> <69ba0085-8c72-a856-0b8c-a9b443343045@debian.org> <ce18b195-e1fa-f28f-be2e-0782a3d3db06@debian.org> <bd93b9cf-b093-5985-0a38-a5403f4b8768@debian.org> <YILCUTdRTJQSrpEW@aurel32.net> <ce18b195-e1fa-f28f-be2e-0782a3d3db06@debian.org> <c97419c6-1709-4228-4a74-bd262e8de49e@debian.org> <YISwADPT48A5vanu@aurel32.net> <[🔎] 8040fc8b-7355-2f99-4509-ce6a572e04bf@debian.org> <[🔎] YIUzApuh8wYy5ZNt@momentum.pseudorandom.co.uk>

On Sun, 25 Apr 2021 at 10:14:51 +0100, Simon McVittie wrote:
> On Sun, 25 Apr 2021 at 08:11:48 +0200, Paul Gevers wrote:
> > On 25-04-2021 01:55, Aurelien Jarno wrote:
> > > It appears that all the failures are related to containers. I have been
> > > able to reproduce the issue with a bullseye kernel, which defaults to
> > > kernel.unprivileged_userns_clone=1. It seems the autopkgtest runners
> > > still use a buster kernel (at least in the case of this build log).

Looking at support/test-container.c, it seems that these tests will
automatically be skipped (FAIL_UNSUPPORTED) on a kernel that restricts
userns creation (like buster), and will be run (and perhaps fail)
on a kernel that does not (like bullseye). So it is not necessarily
a *regression* that they fail - they might just never have been tried
before we started using bullseye kernels.

The brute-force approach to making the autopkgtest not be flaky would be
to make these tests FAIL_UNSUPPORTED unconditionally, which will result
in the same coverage we would have had on buster kernels. Obviously it
would be better if they could be made to pass, but some reliable testing
is better than none.

These tests seem to be failing here (support/test-container.c:1095):

  execvp (new_child_proc[0], new_child_proc);

  /* Or don't run the child?  */
  FAIL_EXIT1 ("Unable to exec %s\n", new_child_proc[0]);

It would be useful if this printed strerror(errno) at least, so that we
can see whether it's ENOENT or EACCES or something else.

Perhaps the test support code is not copying/mounting everything that needs
to be copied/mounted into the container's filesystem? More debug logging in
support/test-container.c would probably be helpful here - perhaps even
running 'find . -ls' in the new_root_path before chrooting into it?

    smcv

Reply to:

Follow-Ups:
- Re: Bug#985617: glibc: flaky autopkgtest on most architectures
  - From: Aurelien Jarno <aurelien@aurel32.net>

References:
- Re: Bug#985617: glibc: flaky autopkgtest on most architectures
  - From: Paul Gevers <elbrus@debian.org>
- Re: Bug#985617: glibc: flaky autopkgtest on most architectures
  - From: Simon McVittie <smcv@debian.org>

Prev by Date: Re: Bug#985617: glibc: flaky autopkgtest on most architectures
Next by Date: Re: Bug#985617: glibc: flaky autopkgtest on most architectures
Previous by thread: Re: Bug#985617: glibc: flaky autopkgtest on most architectures
Next by thread: Re: Bug#985617: glibc: flaky autopkgtest on most architectures
Index(es):
- Date
- Thread