[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

ROCm on EC2

Hey there folks,

I'm not sure if this would work for Debian, but I thought it was worth mentioning that Amazon EC2 now has g4ad instances with AMD Radeon Pro V520 GPUs [1]. These are Navi 12 GPUs (gfx1011) and are not officially supported by ROCm, but there has been unofficial support in the lower parts of the stack for a while [2] and support in the math libraries is a work-in-progress [3].

I'm not very familiar with EC2 yet, but I think you'd need to build your own Amazon Machine Images (AMIs) to test with the Debian Sid kernel. Still, all the userland packaging components could be tested on a run-of-the-mill g4ad instance in a docker container. When choosing an instance size, the main concern is the amount or RAM available. A g4ad.4xlarge (64GiB of RAM) is sufficient to run all tests, but I suspect most components can probably be tested with just a g4ad.xlarge (16 GiB of RAM).

It's perhaps worth noting that you can't just spin up one of these instances. The default vCPU limit for "Running On-Demand G and VT instances" is 0, I had to request a limit increase. It may just be because my account was brand new, but my request was initially denied and I had to escalate the issue by contacting their sales team.If you're running batch jobs for these tests, you may prefer Spot Instances, which appears to be a separate limit.

Cory Bloor

[1]: https://aws.amazon.com/ec2/instance-types/g4/
[2]: https://threedots.ovh/blog/2021/11/what-is-amd-rocm/
[3]: https://github.com/ROCmSoftwarePlatform/rocSOLVER/pull/374

Reply to: