[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Preparing Argo and Lyra for the CI (Was: Preparing Ursa and Lyra for the CI)



Hi folks,

I've confirmed that the AMD FirePro S9300 X2 is a Fiji GPU. It predates Polaris 10 (a.k.a. Ellesmere), so I felt "Ursa" was not appropriate. Fiji is from the Pirate Islands series, so I've renamed "Ursa" to "Argo".

Argo and Lyra are now connected to the Debian ROCm CI and running jobs for gfx803 and gfx900 [1]. There were some firmware troubles, so it was a surprisingly long road to get to this point. Once the hardware was working, Christian was of great help in getting the CI software configured. The CI still lists many failures on gfx803 and gfx900, but new builds should be working for some packages.

One failing test suite is that of rocfft, which times out after five hours [2]. These old servers have terrible single-thread performance, so it takes a long time to run the rocfft test suite. The rocsparse, rocblas, and rocsolver packages are also failing. Those tests crash with the error "Illegal instruction" [3]. I've not yet determined the cause of this problem, but it does not occur when the QEMU CPU model is configured as pass-through. It's not clear to me why this problem is not seen on the gfx1030 CI machine.

Argo and Lyra use a combined total of 450 W at idle, so I might shut them down when the job queue is empty. I'm sure we can do something clever with IPMI to only boot the systems when they're needed, but for now I'll handle it manually.

Sincerely,
Cory Bloor

[1]: https://ci.rocm.debian.net/
[2]: https://ci.rocm.debian.net/packages/r/rocfft/unstable/amd64+gfx900/
[3]: https://ci.rocm.debian.net/packages/r/rocblas/unstable/amd64+gfx900/


Reply to: