[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Futhark on ROCm CI



I've enabled autopkgtest tests for ROCm and I have some questions.

What's the exact condition for a package to be picked by the CI?  I
saw that haskell-futhark showed up on it even before I had any
autopkgtest files defined.  I'm thinking of packaging
futhark-benchmarks next and have them run as well and I'd like to know
what I'd need to do in debian/control to get things rolling.  Would a
recommend on futhark alone do it, via some transitive magic?
futhark-benchmark (not yet even ITP'd) would be a bunch of Futhark
source files to be placed under /usr/src/.

If you're curious about it, the results are at
https://ci.rocm.debian.net/packages/h/haskell-futhark/.  I'm not
worrying about them failing for now, I'll refine the tests with later
uploads.  I have at least one command line flag to try that upstream
suggested to use for them.  The important part is that the HIP tests
are succeeding on at least one architecture, like with
https://ci.rocm.debian.net/packages/h/haskell-futhark/unstable/amd64+gfx1032/39566/

Looks like Futhark's tests are good at stress testing the drivers and
HSA layer.  It has a lot of small tests that a GPU should have no
trouble with running in parallel with little memory use.  Like for
example with
https://ci.rocm.debian.net/packages/h/haskell-futhark/unstable/amd64+gfx1035/39659/
where one test got an error like "Memory access fault by GPU node-1
(Agent handle: 0x55b623418c20) on address 0x7fa60a57a000. Reason: Page
not present or supervisor privilege."

And I had a GPU hang with
https://ci.rocm.debian.net/packages/h/haskell-futhark/unstable/amd64+gfx1011/39621/

Currently, I have enabled three backends for Futhark's tests:
multicore (CPU only), OpenCL+POCL (CPU only) and HIP.  The CPU only
tests are valid as such but I find it doubtful how useful running them
on these machines is.  I think I could make them skippable and do so
on a ROCm CI environment.  Is there a way to detect that it's running
on one?  Simply reversing the /dev/kfd check seems wrong to me.

I'll enable the OpenCL+ROCm test on a future upload.

Any suggestions on how to locally test autopkgtest scripts?  I tried
it with an sbuild setup and that didn't have HSA available in it with
no relevant dev files defined.

I copied over some artifact gathering and the /dev/kfd skip test from
other HIP tests but I'm not liking this code duplication.  Could we
put it in /usr/share/rocm/autopkgtest/ and then I could've used
something like:

#! /bin/sh

/usr/share/rocm/autopkgtest/prelude || exit $?
EXITCODE=0
cp -r tests $AUTOPKGTEST_TMP

futhark test --backend=hip --notty $AUTOPKGTEST_TMP/tests || EXITCODE=1
/usr/share/rocm/autopkgtest/postlude
exit $EXITCODE

I think you would have a better idea which package should own that.
Would it make sense?

Is there some way to define a custom timeout for the CI run?  The
gfx1011 test I linked above took 9 hours and this is embarassing.
Even 2 hours maximum would be excessive for these under any
circumstances.


I'll wrap this up with a motivating example of what Futhark is good
for.  I have a toy program that computes force directed graphs for
https://piperka.net/map/.  Basically it's an ad hoc O(n^2) n-body
simulation in 2d space.  I have a small C program that does the work
and I implemented the core part of it as a GPU program with Futhark
like this:
https://gitlab.com/piperka/forcelayout/-/tree/tmp/futhark-not-yet-working

Don't mind the branch name, it's working after the bugfix commit.  If
someone reads this in the future I may have deleted the branch but the
code will either be in master or some other branch then.

This was my first serious use of Futhark and moving to use it was
simple enough for an experienced Haskell coder like me (not a too
uncommon skill).  My GPU is nothing too fancy (a W6600) and my Futhark
version ran under 10s compared to the 24s of my original CPU version
(on a Ryzen 9 7900X).  There's a Python interface too I haven't
tested.  I know LLMs have stolen all the hype and but I like to have
this option available in Debian.


Reply to: