[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Upload of temporary fork of debci



Hi all,

I'm happy to say that the first workable version of the debci tooling
with GPU support is ready and has been uploaded to our APT archive. This
compliments the GPU support introduced by rocm-dev-tools and added to
the autopkgtest fork.

>From my POV, we now have all the tooling necessary for running our own
CI with GPU support. (In fact, it's already running on my end, I just
want to write up a bit and I'm too tired for that today.)

Mid-to-long-term, the goal is to merge all of this back into debci, of
course, once it has proven itself.

Concept
=======

In this fork, the Architecture field can be extended by modifiers, for
example amd64+gfx1030. A test scheduled for this architecture would
(only) run on an amd64 worker with matching modifier.

So all we have to do now is find workers with GPUs, and add them to the
network with a declaration of their supported arch+modifier. For
example, I've added my box with the RX 6800 XT (gfx1030) as a
amd64+gfx1030 worker.

The approach published today is quite simple. I originally wanted to
introduce the modifier as its own dimension (so that amd64+gfx1030 is a
specialization of amd64), but had to face design decisions left and
right, and I didn't feel comfortable launching a prototype like that.

The prototype published today was easy to create, and I'm sure that what
we'll learn from it can inform a proper design process. The only
drawback is that amd64 and amd64+gfx1030 are distinct architectures,
rather than the latter being a specialization.

Backends
========

The above addresses the 'debci' conceptual view. It allows debci to
*enqueue* the right tests. However, the current debci backends cannot be
used to *run* the tests, as the backends are not GPU-aware.

To this end, leaning on the already existing debci 'qemu' backend, I've
added a 'qemu+rocm' backend which can be used to run tests in QEMU VMs.
This was based on all the work I did for rocm-dev-tools, and it works
flawlessly on my end. Package rocm-qemu-support has all the tooling
needed for setting this up, it's quite trivial to do so.

I'm going to add a podman backend soon.

Next steps
==========

I'll present how all of this works in my next email, which will be
Tuesday (hopefully). I'm still facing some minor hiccups, and tomorrow
will be a crazy work day.

Notes
=====

I've forked the debci repository [1] and implemented the above in the
'arch-variant' branch.

Best,
Christian

[1] https://salsa.debian.org/rocm-team/debci


Reply to: