[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1016937: atop: autopkgtest regression on arm64 and armhf and times out on s390x



Hi,

[tl;dr: atop seems to hang on s390x]

On 12-08-2022 12:23, Marc Haber wrote:
On Thu, Aug 11, 2022 at 10:51:32PM +0200, Paul Gevers wrote:
On 10-08-2022 12:03, Marc Haber wrote:
Unfortunately, this bug report suffers from multiple cut&paste or
template error. The ci link points to the mercurial page for amd64, the
text alternates between s390s, armhf, arm64 and amd64.

There was only one that I'm aware of, the link to mercurial. But I
understand it if the text was a bit confusing.

You said autopkgtest fails on amd64, which was never the case. Maybe
amd64 and arm64 got confused.

What I *wanted* to convey is that arm64 and amd64 *failures* are in our RC policy and all other *regressions* are RC too. I did mix that up.

I tried the (dead simple)d autopkgtest on the s390s and arm64 porterboxes
and it succeeded in a second's time. I have sharpened the expression
that counts the CPUs in lscpu's output and hope this will fix the issue.

ooo, CPU count. Yes, some of those archs run on hosts with lots of CPU's.
armhf has 160, s390x has 10.

I am testing locally on amd64 with a machine with 12 CPUs. The armhf
tests succeed (see
https://ci.debian.net/data/autopkgtest/testing/armhf/a/atop/24578667/log.gz).

Great, same on arm64. s390x still times out though.

The complete test is:
#!/bin/bash

# atop reports number of CPU and two extra lines
ATOPSOPINION="$(atop -P cpu 5 1 | grep -vE '^(RESET|SEP)' | wc -l)"

When I run `atop` manually (on stable), it doesn't do anything...
root@ci-worker-s390x-01:~# atop
^C

I started up a clean unstable lxc container and installing atop takes quite some time between: Created symlink /etc/systemd/system/timers.target.wants/atop-rotate.timer -> /lib/systemd/system/atop-rotate.timer. Created symlink /etc/systemd/system/multi-user.target.wants/atop.service -> /lib/systemd/system/atop.service. Created symlink /etc/systemd/system/multi-user.target.wants/atopacct.service -> /lib/systemd/system/atopacct.service.
and
Could not execute systemctl:  at /usr/bin/deb-systemd-invoke line 145.

running atop from unstable also hangs:
root@elbrus:~# atop
^C

There is no loop, and nothing that could fail on a big number. In my
understanding, this could run on a box with 2000 cores and still work.

Except, it doesn't. Seems like atop is seriously broken on s390x on the hosts that we have.

Also, the test does not time out on zelenka when manually invoked in an
schroot (setting PATH to point to an executable atop is necessary, as it
does not seem to be possible to install an abitrary package that is not
in the archive. Also, the test is successful if invoked after installing
atop 2.7.1-2 from the archive.

Maybe we need to involve the s390x porters? I put them in CC to already draw their attention.

Paul

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Reply to: