Re: RFC: Strategy for getting ROCm test coverage
Hi Christian,
On 2023-03-12 08:38, Christian Kastner wrote:
(2) There are also concerns regarding security and stability of
drivers and hardware, in the sense of a potentially increased
attack surface and/or more maintenance work.
There are a few cloud vendors that provide access to servers with AMD
GPUs. With the scale that they operate on, I would expect them to have
those sorts of issues handled. Perhaps that is an option?
AWS EC2 has G4ad instances with Radeon Pro V520 GPUs [1]. That is Navi
12 hardware (gfx1011) [2]. There are plenty of options for single-GPU
VMs and a few options for multi-GPU VMs. If we're able to use spot
instances and only spin up VMs when we need them, it might end up being
reasonably cost-effective.
Microsoft Azure has NVv4 instances with MI25 GPUs [3]. That is Vega 10
hardware (gfx900) [4]. Unfortunately, I see that these instances are
Windows-only, so I don't think we will be able to use these.
Incidentally, the MI25 and MI60 GPUs have become quite affordable on the
used market. MI25 is practically being given away on ebay right now. I
suspect that not many people have forced-air server rack like they
require, but they would make a great testing platform for us if we had
somewhere to put them.
Sincerely,
Cory Bloor
[1]: https://aws.amazon.com/ec2/instance-types/g4/
[2]: https://www.techpowerup.com/gpu-specs/radeon-pro-v520.c3755
[3]: https://learn.microsoft.com/en-us/azure/virtual-machines/nvv4-series
[4]: https://www.techpowerup.com/gpu-specs/radeon-instinct-mi25.c2983
Reply to: