Please disable "needs-internet" on riscv64 runner
Hi again. (I'm moving this conversation from #debci to allow for a
more systematic and formal approach.)
I have a test which is failing consistently only on riscv64:
https://tracker.debian.org/pkg/chiark-tcl
As I understand it, the underlying factors that cause this situation
are:
* My test case does an AAAA lookup for a domain name on the public
internet that is expected to have an AAAA RR. [1]
* Therefore my test declares a "needs-internet" Restriction.
* The administrators of the network environment for the riscv64 test
runner have arranged for their resolvers to filter out AAAA
records. (I don't know the reason for this, but there surely is
one. Perhaps there is a problem with IPv6 connectivity.)
* Nevertheless the test runner is willing to run tests which declare
"needs-internet", and declares a regression if they fail.
Additionally, I was told on IRC:
* The riscv64 test runner is behind the Great Firewall of China.
* Initially I was told to retry my failing test. I infer that test
failures due to the GFW are not uncommon, but normally stochastic.
Options:
1. We could ask the network administrator for the riscv64 runner for
help and/or a better workaround for whatever the underlying issue
is.
2. I could mark my test as flaky, or remove it, or mark it is not for
execution on riscv64.
3. I could ask the Release Team for an unblock.
4. We could invent a new Restriction "needs-reliable-internet" or
"needs-internet-ipv6" or some such, and I could declare that in my
test case, and we could offer it on all the runners with
satisfactory networking.
5. We could skip "needs-internet" tests on riscv64, or treat those
failures as nonblocking.
6. We could attempt to find a new riscv64 test runner host that has a
reliable internet connection.
7. We could drop riscv64 as a blocker for testing migration.
Analysis:
Options 1 and 6 seem unlikely to bear fruit in a reasonable
timeframe. (Option 1 seems to presume that the network administrator
doesn't have a good reason for filtering out AAAA RRs but that seems
doubtful.)
Options 2 and 3 are obviously wrong. They're putting the workaround
in the wrong place.
Option 7 is a sledgehammer to crack a nut.
As for option 4, my starting point is that "needs-internet" ought to
imply working IPv6 (including both connectivity and DNS resolution),
since that's the dcurrent version of the Internet Protocol. In some
parts of the world, it is the most common version of IP in use.
Probably "needs-internet" ought to continue to imply IPv4 connectivity
(at least via some kind of gateway) for the foreseeable future.
Anyway, apparently there are other, intermittent, problems with the
networking on this host, due to Chinese internet censorship.
I think it is very undesirable for tests to be flaky. Flaky tests,
especially flaky infrastructure, are a great practical nuisance.
Flaky test infrastructure is corrosive to trust, and leads to a
culture of not treating actual heisenbugs as real or important.
I doubt that any host behind the Gread Firewall could be regarded as
having a reliable network connection.
So IMO option 4 is no good, even if we wanted to try to persuade the
autopkgtest maintainers to (re)define "needs-internet", narrowly.
I conclude that option 5 is clearly correct, at least as an interim
measure. It correctly describes the real situation. If the
networking environment can be improved, then that change can be
reverted.
When I first proposed on IRC that this runner should skip tests that
declare "needs-internet", I was told:
| it works most of the time, so I rather not
I find this response extremely surprising. I don't think "works most
of the time" is good enough.
The CI software should operate on the basis what's actually true, not
what we would like to be true. "needs-internet" is not satisfied
on our current riscv64 runner and we shouldn't pretend that it us.
Thanks for your attention.
Ian.
--
Ian Jackson <ijackson@chiark.greenend.org.uk> These opinions are my own.
Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.
Reply to: