Re: Futhark on ROCm CI
Hi Kari,
Cory already answered most of the questions, I'll just expand on some
points:
On 2024-11-04 09:42, Kari Pahula wrote:
> I've enabled autopkgtest tests for ROCm and I have some questions.
>
> What's the exact condition for a package to be picked by the CI? I
> saw that haskell-futhark showed up on it even before I had any
> autopkgtest files defined.
TL;DR: our scheduler will automatically schedule tests for any source
package in the official Archive with a d/control that has a Depends on a
ROCm package.
This is incorrect, because our scheduler should not be looking at
d/control but d/tests/control [6]. This is as limitation of the
underlying libs and I haven't found the time to work around this yet.
Cory already linked to the scheduler repo with the full details, but I
just noticed that said repo is missing the issues that were filed before
the scheduler was factored out, notably [6]. I wish these could be
re-assigned or cloned.
(Here's the weird thing though: at first glance, I can't really tell
which binary of src:haskell-futhark depends on a ROCm package and thus
triggering the test... I need to extend logging.)
> I'm thinking of packaging
> futhark-benchmarks next and have them run as well and I'd like to know
> what I'd need to do in debian/control to get things rolling. Would a
> recommend on futhark alone do it, via some transitive magic?
The scheduler currently only considers Depends. But wouldn't
futhark-benchmarks need to depend, rather than recommend, futhark anyway?
The scheduler supports transitive logic but only for the ROCm packages
(the "Wanted" list). So a change to a package cascades down to its
dependent children, and so on. The scheduler terminates this cascade at
the first non-ROCm package.
> futhark-benchmark (not yet even ITP'd) would be a bunch of Futhark
> source files to be placed under /usr/src/.
The scheduler would currently not pick this up because of the
termination above. Either I fix [6] and you adjust futhark-benchmark's
d/tests/control accordingly, or we add futhark-benchmarks to the
"Wanted" list that Cory mentioned.
> Any suggestions on how to locally test autopkgtest scripts? I tried
> it with an sbuild setup and that didn't have HSA available in it with
> no relevant dev files defined.
Cory correctly pointed out pkg-rocm-tools, documentation here [7]. It's
still in NEW but you can grab it from our own APT repo as well [8].
It's probably easier to use the rootless podman backend. QEMU requires
that the GPU be assigned to VFIO, which means it will no longer be
available for graphical output on the host.
> I copied over some artifact gathering and the /dev/kfd skip test from
> other HIP tests but I'm not liking this code duplication. Could we
> put it in /usr/share/rocm/autopkgtest/ and then I could've used
> something like [...]
This is rocm-test-launcher in pkg-rocm-tools that Cory mentioned.
> Is there some way to define a custom timeout for the CI run? The
> gfx1011 test I linked above took 9 hours and this is embarassing.
> Even 2 hours maximum would be excessive for these under any
> circumstances.
Only in a global way, by means of autopkgtest --timeout-* options. debci
does not yet support package-specific timeouts.
> I'll wrap this up with a motivating example of what Futhark is good
> for. I have a toy program that computes force directed graphs for
> https://piperka.net/map/. Basically it's an ad hoc O(n^2) n-body
> simulation in 2d space. I have a small C program that does the work
> and I implemented the core part of it as a GPU program with Futhark
> like this:
> https://gitlab.com/piperka/forcelayout/-/tree/tmp/futhark-not-yet-working
>
> Don't mind the branch name, it's working after the bugfix commit. If
> someone reads this in the future I may have deleted the branch but the
> code will either be in master or some other branch then.
>
> This was my first serious use of Futhark and moving to use it was
> simple enough for an experienced Haskell coder like me (not a too
> uncommon skill). My GPU is nothing too fancy (a W6600) and my Futhark
> version ran under 10s compared to the 24s of my original CPU version
> (on a Ryzen 9 7900X). There's a Python interface too I haven't
> tested.
Have you tried larger problems? There are overheads, some fixed, to
computation on the GPU (eg: host<->GPU memory transfers) which can
amortize better on bigger problems. On problems that parallelize well,
~10x-20x speed-ups should be reasonable. (Maybe even higher, it's been a
while since my last deep dive and both problems and GPUs have evolved).
Best,
Christian
[6]: https://salsa.debian.org/rocm-team/pkg-rocm-tools/-/issues/15#note_480293
[7]: https://salsa.debian.org/rocm-team/pkg-rocm-tools/
[8]: https://apt.rocm.debian.net/
Reply to: