Howdy folks,
The Debian ROCm Continuous Integration system [1] has seen significant expansion over the past few months, so I thought I'd share an update. In September, I mentioned I was setting up a lot of continuous integration systems [2]. With Christian's help, I've added the gfx90c, gfx1011, gfx1012, and gfx1033 architectures to to the production Debian ROCm CI, while gfx902 and gfx1102 are now on ci-test. There are now 19 different AMD GPU architectures being tested on each package upload.
I'm happy to report that gfx90c, gfx1011 and gfx1012 are working
well in Debian, and the continuous integration machines for gfx902
and gfx1033 have done their jobs in catching bugs on those
architectures.
A lot has happened over the past few months, so I'll just describe half of the new systems for now.
The Argo and Lyra continuous integration machines have been taken offline and moved to the University of Calgary. The university will be hosting those systems for Debian in their data centre. In exchange, we will be sharing the hardware with their researchers. AMD has also donated an MI100 server to the University of Calgary. All three systems are all currently in boxes at the university awaiting installation.
Arctophylax is the MI100 server donated by AMD. It is an
HPE ProLiant XL675d Gen10 Plus (6U) with 8x MI100 GPUs (Arcturus;
gfx908), 2x EPYC 7513 CPUs (Milan; Zen 3; 32 cores), 512 GB of RAM
at 3200 MHz, and 4x ConnectX-6 2x200G NICs. It will be available
for use by Debian developers, UoC researchers, and the Debian ROCm
CI. Unfortunately, I don't have a picture of this beast.
While Argo and Lyra are offline, I've set up two computers to take over their duties:
Vultur is an EPYC 7543P (Milan; Zen 3; 32 cores) system built on the Supermicro H12SSL-NT with 128 GB of RAM at 2933 MHz and a Corsair AX1600i PSU [3]. It was intended to be a 4x MI50 system (Vega 20; gfx906), but it will temporarily be replacing Lyra. It contains an XFX BC-160 (Navi 12; gfx1011) and an AMD Radeon Vega Frontier Edition (Vega 10; gfx900). For this system, AMD provided the CPU and Corsair provided a discount on the power supply. I provided the GPUs, the case and rack rails. This system was made possible by Debian project funding, which covered the rest of the components.
Aquila is an EPYC 7713P (Milan; Zen 3; 64 cores) system built on the Supermicro H12SSL-NT with 256 GB of RAM at 3200 MHz and a Corsair AX1600i PSU [4]. It was intended to be a 4x MI100 system, but that seems redundant now that we have Arctophylax. It has one AMD MI8 GPU (Fiji; gfx803) installed and is currently acting as a temporary replacement for Argo [2]. Aquila is underutilized with just one GPU, but this is a temporary setup. Once Argo is back online, Aquila will be filled with MI25, MI50, and/or MI60 GPUs. For this system, AMD provided the CPU and Corsair provided a discount on the power supply. I provided the GPUs, the case and rack rails. This system was made possible by Debian project funding, which covered the rest of the components.
I've been looking for Infinity Fabric Link cards for the MI50, MI60 or Radeon PRO VII without having any success. If anyone can find a bridge card for any of those GPUs, it would be very much appreciated. They are impossible to find on the used market. I've tried to purchase them several times, but no seller has been able to fulfill my order. I would very much like Aquila and Vultur to have test coverage of the XGMI driver for multi-GPU setups.
I've also set up a few APU test systems:
Cardplayers and Threeskulls are two nearly
identical Ryzen 7 5700G APU (Cezanne; Zen 3; 8 cores; gfx90c)
systems on an ASRock Rack X570D4U motherboard [5]. They will each
have 64GB of RAM at 3200 MHz, but I'm in the process of an RMA so
they only have 32GB at the moment. One was intended to be used for
testing the Ryzen 5 3400G (Picasso; Zen 2; 8 cores; gfx902), but I
couldn't manage to successfully enable the graphics with that
combination of processor and motherboard.
Daydream is a Beelink SER 4 Mini PC based on the Ryzen 7 4700U APU (Renoir; Zen 2; 8 cores; gfx90c) [6]. It has 64GB of RAM at 3200 MHz. This is the third gfx90c machine. I'd thought this would be gfx909, but I was mistaken. There are now three workers for gfx90c, which is probably not a bad thing overall.
At this point, my 42U rack is nearly full [7]. I've not sorted
through all my power usage logs, but the highest peak power draw
seen reported on my PDU's display was ~1.7 kW. If Argo, Lyra and
Arctophylax work well, I will probably move more systems to the
University of Calgary in 2025. My electrical power usage is about
double that of the average Canadian home, so I will be installing
10 kW of rooftop solar in the spring, which should more than
offset the usage of the systems I'm hosting.
Sincerely,
Cory Bloor
[1]: https://ci.rocm.debian.net/
[2]: https://lists.debian.org/debian-ai/2024/09/msg00007.html
[3]: https://slerp.xyz/img/misc/2024/vultur-open.jpg
[4]: https://slerp.xyz/img/misc/2024/aquila-mi8-open.jpg
[5]: https://slerp.xyz/img/misc/2024/threeskulls-open.jpg
[6]:
https://slerp.xyz/img/misc/2024/daydream-with-ddr4-for-size.jpg
[7]: https://slerp.xyz/img/misc/2024/debci-rack.jpg