[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

ROCm GPU_TARGETS and GPU_ARCHS and some other points



Hello,

I'm currently working on composable-kernel [1] package and have some questions about GPU_TARGETS and GPU_ARCHS that are used to build the libraries

I use to look how it's build in TheRock project an I found they only build for

-DGPU_TARGETS="gfx1100;gfx1101;gfx1102"

This allows to build, it's quite taking several hours but it builds

Looking in the CMakeLists.txt I found that I should use
GPU_ARCHS="gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
to build for all arch without tests and examples

---
#In order to build just the CK library (without tests and examples) for all supported GPU targets #use -D GPU_ARCHS="gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201" #the GPU_TARGETS flag will be reset in this case in order to avoid conflicts.
#
#In order to build CK along with all tests and examples it should be OK to set GPU_TARGETS to just 1 or 2 similar architectures.
---

The README.m add some more information like

---
NOTE: If you try setting GPU_TARGETS to a list of architectures, the build will only work if the architectures are similar, e.g., gfx908;gfx90a, or gfx1100;gfx1101;gfx11012. Otherwise, if you want to build the library for a list of different architectures, you should use the GPU_ARCHS build argument, for example GPU_ARCHS=gfx908;gfx1030;gfx1100;gfx942.
---

this raise the question on how to take this in account with
rocm-target-arch tools that will give

rocm-target-arch --sep ';'
gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102

and does not build for fx803;gfx900;gfx906

I could do several build on a per arch base, which has the good property to build tests and examples, but create conflicting per arch packages

On the other hand the build for all arch takes sometime more than 40Gb/core than will be difficult to run on autobuilders

which set to support ? GPU_ARCHS  or GPU_TARGETS ?

are there other packages concerned ? and how do you think it would be best to deal with this ?

Other question, would amd-clang improve memory issues
I noticed that debian clang has no support for parallel jobs

-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Failed

I've seen that ubuntu  llvm-toolchain-rocm package [2] builds clang-rocm
Would composable kernel a good test case to test improvments ?


Regards

Christian B.


[1] https://github.com/ROCm/composable_kernel
[2] https://launchpad.net/~bullwinkle-team/+archive/ubuntu/rocm-devel


Reply to: