[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Using hwcaps instead of simde dispatcher scripts?





On 26/07/2025 19.32, Nilesh Patra wrote:
Hi Michael,

Hey Nilesh!


I recently came across Christian's blog[1] (thanks for this!) where they
use hwcaps[2] to select the appropriate cpu capabilities for ggml package[3].

This is similar to what we do when we add simde patches to a package and
build for all ISAs starting from baseline SSE2 until AVX2 (or even AVX512), and
we end up writing a home-grown script like this [4] to select appropriate cpu capabilities.

Do you think we can get rid of these scripts going forward and instead use hwcaps?

For amd64, I agree that targeting the x86_64-v{1,2,3,4} micro-architectures is a better idea than all the SSE*/AVX* variants that others and I have been building in the past.

Using hwcaps for dynamically-linked scientific computing libraries is a great idea, yes! I recommend improving the documentation at https://wiki.debian.org/InstructionSelection#hwcaps with concrete Debian-specific examples (or perhaps linking to a new wiki page if that gets too long).

**Note**: For applications with functions that benefit from the more advanced CPU capabilities, hwcaps will only work if those functions are compiled to a separate dynamically loaded library (which might be part of the main Debian package for that application, or a shared library package).

Unfortunately, I think that many of the packages from the scientific Debian Blends teams don't put their performance critical functions in a dynamically loaded library, and thus would NOT benefit from the GLIBC 2.33+ hwcaps feature. Using your example of the "scrappie" Debian package, we see that there are only binaries, and no dynamic libraries: https://packages.debian.org/sid/amd64/scrappie/filelist https://packages.debian.org/unstable/scrappie

I would love to see a generic Debian dispatcher script that could be used for amd64 systems (and eventually arm64 & riscv64 systems) to select between binaries using a similar naming scheme to GLIBC hwcaps, but anchored in /usr/bin/ (/usr/bin/x86_64-v[1234]/* ?)
For binary selection, we could add a script to https://tracker.debian.org/pkg/subarch-select which would be symlinked from /usr/bin/app-name and would use subarch-select to choose between /usr/bin/x64_64-v[1234]/app-name based upon the current CPU's capabilities.
Likewise I would love to see shared helpers for d/rules for building both shared library packages and single-binary packages which automate the multiple builds and multiple installation locations needed, thus simplifying the work required to take full advantage of GLIBC hwcaps and/or the debian-wide shared dispatcher script mentioned above. (Some packages might have critical code in both an application binary and shared libraries, thus benefiting from using both of the multi-build approaches outlined above).

For RISCV64, I would suggest that the RISC-V Application Profiles (RVA{20,22,23}) would be used in the same way that the x86_64-v{1,2,3,4} micro-architectures are used on amd64; but this is not yet supported by GLIBC. However Debian could support them in the same way that I suggest above for amd64 in /usr/bin/x86_64-v[234]/*, perhaps using /usr/bin/riscv64-RVA{20,22,23}/*.

For arm64, I think this would require a bit more research. I'm not sure that subsequent ARMv{8,9} revisions are strictly followed as I've noticed that ARM suggests checking for specific CPU features and not for architecture revisions like "ARMv8.6".

Thank you for your nice email on a favorite subject of mine :-)


[1] https://www.kvr.at/posts/easy-dynamic-dispatch-using-GLIBC-hardware-capabilities/
[2] https://manpages.debian.org/unstable/manpages/ld.so.8.en.html#x86~2
[3] https://salsa.debian.org/deeplearning-team/ggml/-/commit/5768f4319d8b547fffb027c78e6dea4453a1e3c9
[4] https://salsa.debian.org/med-team/scrappie/-/blob/master/debian/bin/simd-dispatch?ref_type=heads

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


Reply to: