[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Uploading rocm 6.4.X in sid



Hi Christian K. & B.,

On 2025-09-28 00:44, Christian Kastner wrote:
On 2025-09-28 00:22, Christian BAYLE wrote:
Should we start to upload all possible rocm 6.4.x into sid ?

Any objection if i start to do this for all package building in sid ?
I just did a quick random sample of some 6.4 packages and GPU
architectures, and all of their tests had failed.

In my opinion, these packages are unfit even for unstable. If they are
uploaded to unstable, then they need RC bugs so that they don't migrate
to testing. At least for the officially support architectures.

Experimental is quite a mess, it's true. There's a huge delta between it and unstable, which has grown for months. The compiler has been swapped out twice and there are multiple transitions stacked up. For this reason, I'm not sure that the test results with the existing binaries on experimental are an accurate representations of what we would get if we upload the same sources to unstable.

I'd been uploading the packages to an Ubuntu 25.04 PPA and I was seeing good results in my testing with ROCm 6.4.3 / LLVM 20. With that said, there's at least one major known issue: the fallback architecture patches need to be extended. Gavin Zhao helpfully pointed this out for us last year, and provided some patches for comgr that I've not yet applied.

For this reason, I would suggest that when browsing the test results, you should probably only look at the GPU architectures that we're directly building for. Until those packages are applied, it's an expected failure if the GPU is not directly listed in the pkg-rocm-tools target list.

On 2025-09-28 00:44, Christian Kastner wrote:
(Incidentally, rocprim 6.4 seems to pass on many
architectures, in contrast to the other 6.4 packages).

That's because the binary is so old that it is built against ROCm 5.7 from unstable, where the fallback architecture patches are still working.

I see for example that rocprim was queued from experimental src
https://ci.rocm.debian.net/packages/r/rocprim/
The queue is a bit misleading. It's long because it seems to be stuck
again for and gfx1011, gfx1012.

The commonality between gfx1101 and gfx1102 seems to be that I'd bought a couple more of those GPUs, so each of those systems have two GPUs. It looks like a driver crash is causing the systems to hang.

Sincerely,
Cory Bloor


Reply to: