[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Preparing Argo and Lyra for the CI (Was: Preparing Ursa and Lyra for the CI)



Hi Cory,

On 2023-11-14 20:57, Cordell Bloor wrote:
> Lyra is now functional and stable. This server has four workers, each
> with 8 CPU cores, 30 GiB of RAM and one MI25 GPU (Vega 10; gfx900). The
> PCIe passthrough works flawlessly with qemu, though I am using a
> workaround to enable "-cpu host" until a debci issue is resolved [1].
> Lyra is quite slow, but the system is rock solid.

I'm really, really happy to hear that the QEMU backend is working out so
well. There was very little reliable information on GPU pass-through for
compute (as opposed to graphics), so I'm quite relieved to hear that my
working setup with Navi21 was not just a lucky setup.

Best,
Christian

> rocBLAS is failing due to an alarm timeout [2]. This appears to be
> because rocblas-test is using a timer to abort the program if test is
> not completed fast enough under the assumption that the test must be
> deadlocked. I'm tempted to patch the timer out of rocblas-test since a
> deadlock would eventually be caught by the autopkgtest timeout anyway.
> 
> Now that Lyra has stabilized, the gfx900 tests on the CI should be a
> useful resource about problems in ROCm packages on Debian.

There is also [3] to consider: we need to re-run tests when build
dependencies and/or binary package dependencies change, so that we can
detect what breaks us.

This is a bit tricky, but I'll need to get to it eventually.

> Sincerely,
> Cory Bloor
> 
> [1]: https://salsa.debian.org/rocm-team/debci/-/issues/6
> [2]: https://ci.rocm.debian.net/data/autopkgtest/unstable/amd64+gfx900/r/rocblas/912/log.gz
[3] https://salsa.debian.org/rocm-team/rocm-dev-tools/-/issues/3


Reply to: