[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: remaining issues for pytorch-rocm



On 10/11/24 07:59, Cordell Bloor wrote:
Hi Mo,

These are expected issues. In my opinion, they should both be fixed in PyTorch.

On 2024-10-11 02:22, Mo Zhou wrote:
1. rocm_version.h cannot be found in any rocm package.

rocm_version.h is provided by librocm-core. The librocm-core library provides two functions: getROCmVersion and getROCmInstallPath. In my opinion, these functions are conceptually flawed and should never be used.

Based on the explanations I agree that rocm-core makes no sense at all.
We do not need an empty package to do what should be done in cmake for each individual component.
2. hipblaslt is missing while hipblas is present.

The hipblaslt library only supports CDNA 2, CDNA 3 and RDNA 3 GPUs. That is a subset of the AMD GPUs that PyTorch supports. The hipblaslt library could clearly be made optional, even if the PyTorch build system doesn't treat it as such.

While we should certainly finish packaging hipblaslt, but I think we may want help the upstream PyTorch project to make this an optional dependency. If nothing else, the library is ~10 GB in ROCm 6.3, and users with Vega and RDNA 2 GPUs may appreciate the disk space savings.

I'm not sure whether it is easy to drop this dependency from the pytorch side. Would be glad to see if it can be made optional.


Reply to: