ROCm GPU_TARGETS and GPU_ARCHS and some other points
Hello,
I'm currently working on composable-kernel [1] package and have some
questions about GPU_TARGETS and GPU_ARCHS that are used to build the
libraries
I use to look how it's build in TheRock project an I found they only
build for
-DGPU_TARGETS="gfx1100;gfx1101;gfx1102"
This allows to build, it's quite taking several hours but it builds
Looking in the CMakeLists.txt I found that I should use
GPU_ARCHS="gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
to build for all arch without tests and examples
---
#In order to build just the CK library (without tests and examples) for
all supported GPU targets
#use -D
GPU_ARCHS="gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
#the GPU_TARGETS flag will be reset in this case in order to avoid
conflicts.
#
#In order to build CK along with all tests and examples it should be OK
to set GPU_TARGETS to just 1 or 2 similar architectures.
---
The README.m add some more information like
---
NOTE: If you try setting GPU_TARGETS to a list of architectures, the
build will only work if the architectures are similar, e.g.,
gfx908;gfx90a, or gfx1100;gfx1101;gfx11012. Otherwise, if you want to
build the library for a list of different architectures, you should use
the GPU_ARCHS build argument, for example
GPU_ARCHS=gfx908;gfx1030;gfx1100;gfx942.
---
this raise the question on how to take this in account with
rocm-target-arch tools that will give
rocm-target-arch --sep ';'
gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102
and does not build for fx803;gfx900;gfx906
I could do several build on a per arch base, which has the good property
to build tests and examples, but create conflicting per arch packages
On the other hand the build for all arch takes sometime more than
40Gb/core than will be difficult to run on autobuilders
which set to support ? GPU_ARCHS or GPU_TARGETS ?
are there other packages concerned ? and how do you think it would be
best to deal with this ?
Other question, would amd-clang improve memory issues
I noticed that debian clang has no support for parallel jobs
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Failed
I've seen that ubuntu llvm-toolchain-rocm package [2] builds clang-rocm
Would composable kernel a good test case to test improvments ?
Regards
Christian B.
[1] https://github.com/ROCm/composable_kernel
[2] https://launchpad.net/~bullwinkle-team/+archive/ubuntu/rocm-devel
Reply to: