[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

2025-08-07 Meeting Notes



These are the notes that I attempted to take during today's meeting. Please post corrections if I misunderstood or missed something that was discussed.

Tim
----------

## migraphx
utkarsh has been working on this package

Problem: when compiling, very high memory usage. compiling with llvm-19 works, building with gcc in debuild causes memory issues.

To repoduce:

 * `git clone https://salsa.debian.org/rocm-team/migraphx.git`
 * `cd migraphx`
 * `uscan --force-download --download-current-version`
 * `debuild -us -uc`
 * This will fail @ 33% .. RAM usage (16/16GB full)
 * using make to build instead of debuild does complete
 * it segfaults at around 30% completion in debuild
 * uktarsh's attempts to build for Fedora and Gentoo in containers has not had the same problem

suggestions:
 * `DEB_BUILD OPTION` `S=parallel=N`
	 * has been tried, did not help

No conclusion here, folks will need to dig into it more and follow up later

## single GPU target building

dbcsr can only be built against a single ISA at a time - how do we get a package that can work with multiple ISAs?

libdbcsr.a is the only part that has ISA-specific functions. 

possible solutions:

 * create one binary for each ISA
 * there are configuration json files for different ISAs
	 * this means that it's not possible to use cmake to support multiple ISAs?

This has been brought up in upstream github: https://github.com/cp2k/dbcsr/discussions/933

proposed solution:

 - create new cmake flag to allow building for multiple ISAs
 - when the build gets to a point where it is doing ISA specific bits, create one libdbcsr.a per ISA specified in the new cmake flag

kokkos has the same problem. Fedora solves it using the bash module system

 * binaries have the same name but different paths per ISA
 * users have to use commands like `load module gfx1100` to load the correct paths for the desired ISA
 * this means that any dependant packages have essentially do the same thing and loop over all of the ISA targets

Another possible solution:

  * have a CPU only version
  * choose to install which ISA-specific package to isntall if you want GPU capability
  * assuming the ABI is the same for CPU/GPU, have the packages conflict with eachother
  * this would have similar problems to how Fedora solved the kokkos problem with modules

Concern: Even if we're able to convince dbcsr upstream to make changes to build for multiple ISAs, there will eventually be an upstream that has no interest in doign this - how do we want to handle the general problem?

Action items:

 * spaarsh will update the discussion with dbcsr upstream
 * utkarsh will look at Fedora's kokkos package to evaluate whether that solution will work well enough.

Any proposals will need to be discussed with the debian-science folks as kokkos will be a team-maintained package.

## gfx1201 support?

Support for gfx1201 was added in rocm 6.4.2

cory is working to update the compiler to 6.4.2 but all libraries need to be updated to 6.4.2 for gfx1201 support to fully land

As an aside - there are some missing packages if we want to build the latest pytorch with ROCm support

 * hipblaslt
 * hipsparselt

For folks who want to help: please start looking at any of the ROCm match libraries that are not already updated to 6.4.2 and work on getting them updated.

Hopefully the rocm compiler update to 6.4.2 will be done by the end of this week. The runtime will be a bigger lift once the compiler update lands.

There was a question around whether the current method of building rocm-llvm on top of the system llvm would continue to be workable as we expand the number of ROCm components packaged in Debian.


## hipcxx proposal

relevant mailing list thread: https://lists.debian.org/debian-ai/2025/07/msg00105.html

Some llvm folk have not had a positive response to this proposal, the path forward is unclear.

More discussion on list would be appreciated.

## AMD's clang fork

relevant mailing list thread: https://lists.debian.org/debian-ai/2025/06/msg00105.html

Note: there are still open questions around this proposal which need to be addressed, responses should be coming soon.

Fedora's approach - use the AMD clang fork but put it in a private path (/usr/lib64/rocm/llvm) so that it doesn't conflict with the system llvm.

more discussion on list will be needed

Related topic: The licensing of Debian's llvm package metadata (maintained by the Debian llvm team) doesn't match the licensing of the rocm-llvm package metadata (maintained by the ROCm team). We should start working on getting the existing rocm-llvm metadata relicensed to match the system llvm package to make collaboration/sharing easier.


## kde alpaka

Relevant links:

 * https://apps.kde.org/alpaka/
 * https://ftp-master.debian.org/new/alpaka_0.1.1~git20250716.6da6dd61-1.html

initial package has been submitted, will have AMD accelerator support?

The app is still in an early development phase

## gloo

spaarsh has been working on gloo

 * gloo uses/used old ROCm functions, spaarsh has been working to update those calls
 * package builds successfully but there are warnings about undefined symbols
 * gloo takes cuda code, has some custom python scripts that essentially use hipify for the rocm package.
 * spaarsh is asking for help on fixing these issues, please reply to list
	 * https://lists.debian.org/debian-ai/2025/06/msg00135.html



Reply to: