Re: pytorch-cuda: What range of GPUs to support?

To: debian-ai@lists.debian.org
Cc: debian-science@lists.debian.org
Subject: Re: pytorch-cuda: What range of GPUs to support?
From: "M. Zhou" <lumin@debian.org>
Date: Tue, 18 Feb 2025 21:15:10 -0500
Message-id: <[🔎] def4afd22df3b60ed97210e30dc320cb82c91254.camel@debian.org>
In-reply-to: <[🔎] 747cd15759c1bc3caff9c3110233de251bc08029.camel@debian.org>
References: <[🔎] 747cd15759c1bc3caff9c3110233de251bc08029.camel@debian.org>

On Tue, 2025-02-18 at 18:24 -0500, M. Zhou wrote:
> Hi Team,
> 
> (CC'ed debian-science, but please redirect discussion to -ai@l.d.o)
> 

After reading the CUDA documentation, I realize that the previous list
is problematic. Some GPUs like A100 will be excluded for support.
https://docs.nvidia.com/cuda/hopper-compatibility-guide/index.html
Nvidia A100 is architecture 8.0. The only 8.X cubin included in the
list "6.1;7.5;8.6" is 8.6, which is higher than the device capability.
So the built binary will not run on A100 (according to the documentation).

I intended to exclude GPUs older than GTX1080 (6.1), but the pytorch
upstream pypi package seems to support even older GPUs:

In [3]: torch.cuda.get_arch_list()
Out[3]: ['sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']

So I'm going to revise the architecture list, following the upstream.
That will make the binary much larger but I think it is a less
questionable configuration. Maybe something like this:

  5.0;6.0;7.0;7.5;8.0;8.6;9.0+PTX

I hope this will not lead to linker overflow.

Reply to:

References:
- pytorch-cuda: What range of GPUs to support?
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: ROCm for AMD GPUs at Ubuntu Summit 2024
Next by Date: Re: python3-nose removal (or revival ?)
Previous by thread: pytorch-cuda: What range of GPUs to support?
Next by thread: Re: python3-nose removal (or revival ?)
Index(es):
- Date
- Thread