[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1114459: pmix breaks slurm-wlm autopkgtest: causes test to time out



Source: pmix, slurm-wlm
Control: found -1 pmix/6.0.0-3
Control: found -1 slurm-wlm/24.11.5-4
Severity: serious
Tags: sid trixie
User: debian-ci@lists.debian.org
Usertags: breaks needs-update

Dear maintainer(s),

With a recent upload of pmix the autopkgtest of slurm-wlm fails in testing when that autopkgtest is run with the binary packages of pmix from unstable. It times out after 2:47h, where normally it only takes minutes. It passes when run with only packages from testing. In tabular form:

                       pass            fail
pmix                   from testing    6.0.0-3
slurm-wlm              from testing    24.11.5-4
all others             from testing    from testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of pmix to testing [1]. Due to the nature of this issue, I filed this bug report against both packages. Can you please investigate the situation and reassign the bug to the right package?

More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[1] https://qa.debian.org/excuses.php?package=pmix

https://ci.debian.net/data/autopkgtest/testing/amd64/s/slurm-wlm/64130829/log.gz

383s ● slurmctld.service - Slurm controller daemon
383s Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; preset: enabled) 383s Active: active (running) since Fri 2025-09-05 03:51:07 UTC; 10s ago
383s  Invocation: 612aa5cddd6f46faaa0671f23b1f95eb
383s        Docs: man:slurmctld(8)
383s    Main PID: 3312 (slurmctld)
383s       Tasks: 88
383s      Memory: 5.2M (peak: 9M)
383s         CPU: 84ms
383s      CGroup: /system.slice/slurmctld.service
383s              ├─3312 /usr/sbin/slurmctld --systemd
383s              └─3379 "slurmctld: slurmscriptd"
383s 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: No job state file (/var/lib/slurm/slurmctld/job_state.old) to recover 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: error: Could not open reservation state file /var/lib/slurm/slurmctld/resv_state: No such file or directory 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: error: NOTE: Trying backup state save file. Reservations may be lost 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: No reservation state file (/var/lib/slurm/slurmctld/resv_state.old) to recover 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: error: Could not open trigger state file /var/lib/slurm/slurmctld/trigger_state: No such file or directory 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: error: NOTE: Trying backup state save file. Triggers may be lost! 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: No trigger state file (/var/lib/slurm/slurmctld/trigger_state.old) to recover 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: read_slurm_conf: backup_controller not specified 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: Reinitializing job accounting state 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmctld[3312]: slurmctld: Running as primary controller
383s ● slurmd.service - Slurm node daemon
383s Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; preset: enabled) 383s Active: active (running) since Fri 2025-09-05 03:51:07 UTC; 10s ago
383s  Invocation: 91f78f38727e43a1b6b612ea9ff72296
383s        Docs: man:slurmd(8)
383s    Main PID: 3406 (slurmd)
383s       Tasks: 12
383s      Memory: 2.2M (peak: 3.8M)
383s         CPU: 62ms
383s      CGroup: /system.slice/slurmd.service
383s              └─3406 /usr/sbin/slurmd --systemd
383s 383s Sep 05 03:51:07 ci-248-6c8bbe56 systemd[1]: Starting slurmd.service - Slurm node daemon... 383s Sep 05 03:51:07 ci-248-6c8bbe56 (slurmd)[3406]: slurmd.service: Referenced but unset environment variable evaluates to an empty string: SLURMD_OPTIONS 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: _read_slurm_cgroup_conf: No cgroup.conf file (/etc/slurm/cgroup.conf), using defaults 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: _read_slurm_cgroup_conf: No cgroup.conf file (/etc/slurm/cgroup.conf), using defaults 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: error: Node configuration differs from hardware: CPUs=1:64(hw) Boards=1:1(hw) SocketsPerBoard=1:1(hw) CoresPerSocket=1:32(hw) ThreadsPerCore=1:2(hw) 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: CPU frequency setting not configured for this node 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: slurmd version 24.11.5 started 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: slurmd started on Fri, 05 Sep 2025 03:51:07 +0000 383s Sep 05 03:51:07 ci-248-6c8bbe56 systemd[1]: Started slurmd.service - Slurm node daemon. 383s Sep 05 03:51:07 ci-248-6c8bbe56 slurmd[3406]: slurmd: CPUs=1 Boards=1 Sockets=1 Cores=1 Threads=1 Memory=257333 TmpDisk=256000 Uptime=1279 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
383s PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
383s test*        up   infinite      1   idle localhost
383s NODELIST NODES PARTITION STATE 383s localhost 1 test* idle 10374s autopkgtest [06:37:48]: ERROR: timed out on command "su -s /bin/bash root -c set -e; exec /tmp/autopkgtest-lxc.a9795u61/downtmp/wrapper.sh --artifacts=/tmp/autopkgtest-lxc.a9795u61/downtmp/mpi-artifacts --chdir=/tmp/autopkgtest-lxc.a9795u61/downtmp/build.CW7/src --env=AUTOPKGTEST_TESTBED_ARCH=amd64 --env=AUTOPKGTEST_TEST_ARCH=amd64 --env=DEB_BUILD_OPTIONS=parallel=64 --env=DEBIAN_FRONTEND=noninteractive --env=LANG=C.UTF-8 --unset-env=LANGUAGE --unset-env=LC_ADDRESS --unset-env=LC_ALL --unset-env=LC_COLLATE --unset-env=LC_CTYPE --unset-env=LC_IDENTIFICATION --unset-env=LC_MEASUREMENT --unset-env=LC_MESSAGES --unset-env=LC_MONETARY --unset-env=LC_NAME --unset-env=LC_NUMERIC --unset-env=LC_PAPER --unset-env=LC_TELEPHONE --unset-env=LC_TIME --script-pid-file=/tmp/autopkgtest_script_pid --source-profile --stderr=/tmp/autopkgtest-lxc.a9795u61/downtmp/mpi-stderr --stdout=/tmp/autopkgtest-lxc.a9795u61/downtmp/mpi-stdout --tmp=/tmp/autopkgtest-lxc.a9795u61/downtmp/autopkgtest_tmp --env=AUTOPKGTEST_NORMAL_USER=debci --env=ADT_NORMAL_USER=debci --make-executable=/tmp/autopkgtest-lxc.a9795u61/downtmp/build.CW7/src/debian/tests/mpi -- /tmp/autopkgtest-lxc.a9795u61/downtmp/build.CW7/src/debian/tests/mpi" (kind: test)
10374s autopkgtest [06:37:48]: test mpi

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


Reply to: