[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Please disable "needs-internet" on riscv64 runner



Hi Ian,

Thanks for reporting this issue for riscv64 runners again.

I am one Debian RISC-V porter and in charge of helping to maintain
these riscv64 runners now.

On Mon, May 26, 2025 at 8:46 PM Ian Jackson
<ijackson@chiark.greenend.org.uk> wrote:
>
> Hi again.  (I'm moving this conversation from #debci to allow for a
> more systematic and formal approach.)
>
> I have a test which is failing consistently only on riscv64:
>   https://tracker.debian.org/pkg/chiark-tcl
>
> As I understand it, the underlying factors that cause this situation
> are:
>
>  * My test case does an AAAA lookup for a domain name on the public
>    internet that is expected to have an AAAA RR. [1]
>
>  * Therefore my test declares a "needs-internet" Restriction.
>
>  * The administrators of the network environment for the riscv64 test
>    runner have arranged for their resolvers to filter out AAAA
>    records.  (I don't know the reason for this, but there surely is
>    one.  Perhaps there is a problem with IPv6 connectivity.)
>

The situation should be improved now, the administrators of the
network has added AAAA record resolution, see
https://ci.debian.net/packages/c/chiark-tcl/testing/riscv64/

Here I made one mistake to verify it. If I download chiark-tcl[0]
manual, and on one debci runners to run autopkgtest with
`autopkgtest-unstable-riscv64` it passed, but if I run it with
`autopkgtest-testing-riscv64`, it will fail with the same log from
#1106452. In fact, I launched autopkgtest with ` --no-built-binaries`.
>From chiark-tcl[2] page, the 1.3.7 has not been migrated to testing,
so autopkgtest just got 1.3.5 binary file.

[0]: https://deb.debian.org/debian/pool/main/c/chiark-tcl/chiark-tcl_1.3.7.dsc
[1]: https://bugs.debian.org/1106452
[2]: https://tracker.debian.org/pkg/chiark-tcl

>  * Nevertheless the test runner is willing to run tests which declare
>    "needs-internet", and declares a regression if they fail.
>
> Additionally, I was told on IRC:
>
>  * The riscv64 test runner is behind the Great Firewall of China.
>
>  * Initially I was told to retry my failing test.  I infer that test
>    failures due to the GFW are not uncommon, but normally stochastic.
>
> Options:
>
>  1. We could ask the network administrator for the riscv64 runner for
>     help and/or a better workaround for whatever the underlying issue
>     is.

We are working on this. In fact this is not very hard for our
excellent network admin because we have some configuration conflicts.
Once we understand the community's demands, we should address them
ASAP.

>
>  2. I could mark my test as flaky, or remove it, or mark it is not for
>     execution on riscv64.
>
>  3. I could ask the Release Team for an unblock.
>
>  4. We could invent a new Restriction "needs-reliable-internet" or
>     "needs-internet-ipv6" or some such, and I could declare that in my
>     test case, and we could offer it on all the runners with
>     satisfactory networking.
>
>  5. We could skip "needs-internet" tests on riscv64, or treat those
>     failures as nonblocking.
>
>  6. We could attempt to find a new riscv64 test runner host that has a
>     reliable internet connection.

We are working on this also. But this will need time. Please notice
that finding one place to host so many riscv64 runners is not
realistic in one short term.

>
>  7. We could drop riscv64 as a blocker for testing migration.
>
> Analysis:
>
...
> I conclude that option 5 is clearly correct, at least as an interim
> measure.  It correctly describes the real situation.  If the
> networking environment can be improved, then that change can be
> reverted.
>
In fact, we hope to keep `needs-internet` on riscv64 runners because
this is very useful. Beside chiark-tcl's case, we have other network
access issue also, but the annoying thing is, there are some deltas
between networks among these runners. The good news is that we are
unifying the network environments.


> When I first proposed on IRC that this runner should skip tests that
> declare "needs-internet", I was told:
>  | it works most of the time, so I rather not
>
> I find this response extremely surprising.  I don't think "works most
> of the time" is good enough.
>

I think elbrus's meaning is, we have run debci on riscv64 for a long
time with almost three years and faced some issue also, but the
numbers are very limited compared with huge runner clusters and some
of them can get addressed soon.

I will summarize our current riscv64 runners in another email thread
on debian-ci@l.d.o and update it at least quarterly.

BR,
Bo

Attachment: image.png
Description: PNG image


Reply to: