[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debian ROCm CI troubles



Hi Christian,

There seems to be something wrong with head node for the ROCm Debian CI [1]. There have been many new uploads, but it doesn't seem to be running jobs for them. I'm also seeing an Internal Server Error when I try to manually request jobs. We would really benefit from having the CI available during the ROCm 5.7 -> 6.1 -> 6.4 and LLVM 17 -> 19/20 updates. I hate to ask anything more from you, but your expertise with this system is unmatched. Do you think you could give it a kick and get it working again?

If there are folks on this list that want to lend a hand but aren't sure how to help out with ROCm, then I would suggest that contributing to the DebCI would be greatly beneficial. Aside from fixing the bugs that cause the queues to stall, it would be nice to improve the user interface so that there is more information displayed directly on the website about what the DebCI head node is doing. I'd like to see information about the status of worker nodes, the state of the queues (e.g., jobs in progress), more results visible at a glance (e.g., percentage failed rather than just pass/fail), and a more useful main page. I think a lot of these improvements could be upstreamed into the official DebCI.

We also need to increase the bus factor on the number of individuals with a solid understanding of the ROCm-enhanced DebCI system. Fixing bugs and adding features would be a great way to learn about it.

Sincerely,
Cory Bloor

[1]: https://ci.rocm.debian.net/


Reply to: