gpuenv-utils with multiple podman workers

To: Christian Kastner <ckk@debian.org>
Cc: Debian ROCm Team <debian-ai@lists.debian.org>
Subject: gpuenv-utils with multiple podman workers
From: Cordell Bloor <cgmb@slerp.xyz>
Date: Mon, 21 Jul 2025 04:30:40 -0600
Message-id: <[🔎] be8da5c8-4511-482e-aa1f-18c741fd1e89@slerp.xyz>

Hi Christian,

I have acquired a few more RDNA1 GPUs to ensure that we have at leasttwo workers for each of gfx1010, gfx1011, and gfx1012 onci.rocm.debian.net. To achieve this, I'm trying to increase the numberof podman workers to two or three per host, but I'm running into a problem.

The pretest acquires all GPUs on the system and attempts to lock themwith gpuenv-utils. The first worker locks both GPUs and then the secondworker then times out when it tries to do the same. This locking is doneby /usr/share/debci/util/pre-test. I could remove the call to pre-test,but the health check is very useful for preventing a broken worker nodefrom consuming the entire job queue and reporting every job as failed.

To control access to the GPUs, I've set environment variables in theautopkgtest arguments for each worker. For the first worker, I use--env=ROCR_VISIBLE_DEVICES=0 and for the second worker I use--env=ROCR_VISIBLE_DEVICES=1. I suppose I would also need to communicatethis restriction to the pretest somehow.


Sincerely,
Cory Bloor

Reply to:

Follow-Ups:
- Re: gpuenv-utils with multiple podman workers
  - From: Christian Kastner <ckk@debian.org>

Prev by Date: rocm-validation-suite requires hipblaslt
Next by Date: RFC: /usr/bin/hipcxx for system default HIP compiler
Previous by thread: rocm-validation-suite requires hipblaslt
Next by thread: Re: gpuenv-utils with multiple podman workers
Index(es):
- Date
- Thread