[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: cuda support of src:pytorch

Hi Christian,

On Thu, Dec 03, 2020 at 09:23:51AM +0100, Christian Kastner wrote:
> However, I anticipate that, like the Nvidia driver and CUDA, this will
> be backported regularly. Support for the RTX 3000 cards only very
> recently in CUDA and cuDNN, and there still appear to be some issues.

Ah, presumably what you said is true. As we have to provide updates
through the backports channel, I think it's not necessary for us to rush
a cuda version of pytorch ... let's pay attention to consolidating the
quality of the existing packages for the bullseye release as proposed at
the beginning.

Not rushing for the thing, at least, make the end of the year more
comfortable to me.
> So users of these cards (which, until bullseye+1, will probably be quite
> a few) will all but certainly have to rely on backports.

I'm facing exactly these issues. There are 8 RTX3090 cards in one of the
servers in my lab, but the nvidia LTS driver looks too buggy. Training
program in multi-process manner (distributed data parallel) will
immediately hang. I guess something is wrong in the driver/nccl or
alike. The same program works nicely on RTX2080Ti *8.

Reply to: