ROCm on EC2
Hey there folks,
I'm not sure if this would work for Debian, but I thought it was worth
mentioning that Amazon EC2 now has g4ad instances with AMD Radeon Pro
V520 GPUs [1]. These are Navi 12 GPUs (gfx1011) and are not officially
supported by ROCm, but there has been unofficial support in the lower
parts of the stack for a while [2] and support in the math libraries is
a work-in-progress [3].
I'm not very familiar with EC2 yet, but I think you'd need to build your
own Amazon Machine Images (AMIs) to test with the Debian Sid kernel.
Still, all the userland packaging components could be tested on a
run-of-the-mill g4ad instance in a docker container. When choosing an
instance size, the main concern is the amount or RAM available. A
g4ad.4xlarge (64GiB of RAM) is sufficient to run all tests, but I
suspect most components can probably be tested with just a g4ad.xlarge
(16 GiB of RAM).
It's perhaps worth noting that you can't just spin up one of these
instances. The default vCPU limit for "Running On-Demand G and VT
instances" is 0, I had to request a limit increase. It may just be
because my account was brand new, but my request was initially denied
and I had to escalate the issue by contacting their sales team.If you're
running batch jobs for these tests, you may prefer Spot Instances, which
appears to be a separate limit.
Sincerely,
Cory Bloor
[1]: https://aws.amazon.com/ec2/instance-types/g4/
[2]: https://threedots.ovh/blog/2021/11/what-is-amd-rocm/
[3]: https://github.com/ROCmSoftwarePlatform/rocSOLVER/pull/374
Reply to: