[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Packaging hipblaslt in progress



Hi Kari,

On 2024-03-15 12:45, Kari Pahula wrote:
################################################################################
# Tensile Create Library
Tensile::WARNING: Did not detect SupportedISA: [(8, 0, 3), (9, 0, 0), (9, 0, 6), (9, 0, 8), (9, 0, 10), (9, 4, 0), (9, 4, 1), (9, 4, 2), (10, 1, 0), (10, 1, 1), (10, 1, 2), (10, 3, 0), (11, 0, 0), (11, 0, 1), (11
, 0, 2)]; cannot benchmark assembly kernels.
make[3]: Leaving directory '/home/kaol/deb/hipblaslt/6/hipblaslt/obj-x86_64-linux-gnu'
[ 11%] Built target hipblaslt-test-data
/home/kaol/deb/hipblaslt/6/hipblaslt/obj-x86_64-linux-gnu/library/build_tmp/ops/S_8_32_gfx803.s:257:1: error: directive requires gfx90a+
.amdhsa_accum_offset 8
^~~~~~~~~~~~~~~~~~~~
/home/kaol/deb/hipblaslt/6/hipblaslt/obj-x86_64-linux-gnu/library/build_tmp/ops/S_8_32_gfx803.s:258:1: error: unknown directive
.amdhsa_next_free_vgpr .amdgcn.next_free_vgpr
^
/home/kaol/deb/hipblaslt/6/hipblaslt/obj-x86_64-linux-gnu/library/build_tmp/ops/S_8_32_gfx803.s:259:1: error: unknown directive
.amdhsa_next_free_sgpr .amdgcn.next_free_sgpr
^
/home/kaol/deb/hipblaslt/6/hipblaslt/obj-x86_64-linux-gnu/library/build_tmp/ops/S_8_32_gfx803.s:260:1: error: unknown directive
.end_amdhsa_kernel
^

And more like that.  Later on:

clang++-17: error: no such file or directory: '/home/kaol/deb/hipblaslt/6/hipblaslt/obj-x86_64-linux-gnu/library/build_tmp/ops/S_8_32_gfx803.o'

It didn't seem like a fruitful approach to try to continue from that.

Ah. This must be why it is called "tensilelite" in the hipblaslt repo. They seem to have forked Tensile and have been patching it without regard for how their changes affect older architectures. It's "lite" because they've dropped a lot of features, including hardware compatibility.

I've been busy with some other things but I've been looking at this
more today.  I tried "export HSA_OVERRIDE_GFX_VERSION=10.3.0" that you
suggested in your second message but I got all the same test errors
with that.  Even after just removing my compatibility patch and only
using the environment variable.

I'm afraid I'm stumped with this.  I've tried looking at how rocblas
does this but I'm coming up short with what else to try.  I can finish
with other packaging work but it won't do to upload this if I haven't
checked that it's functioning properly.

The error messages above pushed me to go back and look what is listed as supported. It seems that the only hardware officially supported with hipBLASLt on ROCm 5.7.1 is MI200 (gfx90a) [1], although it was built for MI300 as well [2].

This is not a library that we're going to be able to port to other architectures. I suppose the good news is that AMD is donating a pair of MI210s for use by the Debian AI team. I got a shipping notification today, so the hardware should be available within a few weeks. I'll follow up privately to get you access.

I would suggest copying the upstream AMDGPU_TARGETS from [2], but dropping the gfx940 and gfx941 entries. The gfx940 and gfx941 architectures were only used in early engineering samples, not in any retail hardware. So, that would be -DAMDGPU_TARGETS="gfx90a:xnack+;gfx90a:xnack-;gfx942:xnack+;gfx942:xnack-".

For most libraries, I don't bother with specializing on xnack, but Tensile is fragile and I would not deviate from the ISAs specified by upstream until we're very familiar with the library.

With regards to DFSG issues in the Tensile sources, there were a few kernels
that were provided as binary blobs. For those, the problem was that the
shader language used for the original sources did not have an open source
compiler, so the compiled sources were checked in instead. The YAML files
that were removed from rocBLAS were those that referenced the removed
shaders. I believe that the binary blobs were removed in ROCm 6, so Tensile
should no longer be DFSG burdened. However, I have not actually reviewed the
updates yet myself.
I'm not quite sure that's actually a DFSG issue.  I think it's enough
to just have the source available with an acceptable license.  Though
it's not ideal for sure.  I think there was a case about a game's
music getting a pass despite the program to generate the music being
proprietary.

The source is not publicly available. When I asked the Tensile team about publishing the original sources, they said they'd prefer to drop the binary kernels instead. The binary kernels were merely quick fixes for missing features in the Tensile kernel generator, so the Tensile team enhanced the generator and dropped the DFSG-encumbered kernels. They'd apparently been working towards that for a while.

I'm guessing it could be a future project to package
https://github.com/ROCm/Tensile and use that for both rocblas and
hipblaslt.  Buildd maintainers would like it at least, running Tensile
during a build seems to be quite a room warmer.
I've filed an ITP for rocm-tensile [1]. However, the separate Tensile
package will not have any significant effect on build times.
I suspected that to be the case.  But I agree with what you said,
using single dependency for rocblas and hipblaslt is good practice.

I'm crossing my fingers it will be possible.

Sincerely, Cory Bloor

[1]: https://github.com/ROCm/hipBLASLt/blob/rocm-5.7.1/README.md#hardware-requirements
[2]: https://github.com/ROCm/hipBLASLt/blob/rocm-5.7.1/CMakeLists.txt#L149

Reply to: