Debian ROCm CI troubles
Hi Christian,
There seems to be something wrong with head node for the ROCm Debian CI
[1]. There have been many new uploads, but it doesn't seem to be running
jobs for them. I'm also seeing an Internal Server Error when I try to
manually request jobs. We would really benefit from having the CI
available during the ROCm 5.7 -> 6.1 -> 6.4 and LLVM 17 -> 19/20
updates. I hate to ask anything more from you, but your expertise with
this system is unmatched. Do you think you could give it a kick and get
it working again?
If there are folks on this list that want to lend a hand but aren't sure
how to help out with ROCm, then I would suggest that contributing to the
DebCI would be greatly beneficial. Aside from fixing the bugs that cause
the queues to stall, it would be nice to improve the user interface so
that there is more information displayed directly on the website about
what the DebCI head node is doing. I'd like to see information about the
status of worker nodes, the state of the queues (e.g., jobs in
progress), more results visible at a glance (e.g., percentage failed
rather than just pass/fail), and a more useful main page. I think a lot
of these improvements could be upstreamed into the official DebCI.
We also need to increase the bus factor on the number of individuals
with a solid understanding of the ROCm-enhanced DebCI system. Fixing
bugs and adding features would be a great way to learn about it.
Sincerely,
Cory Bloor
[1]: https://ci.rocm.debian.net/
Reply to: