[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Uploaded rocm-dev-tools



Hi all,

On 2023-07-18 18:05, Christian Kastner wrote:
> I'm happy to announce that apt.rocm.debian.net [1] is up and running.
> 
> Through this server, we can distribute packages that are either
>   (1) Our own tooling, not yet ready for upload to the official Archive

TL;DR In apt.r.d.n, there are now packages shipping utilities that
vastly simplify GPU in VM or in container use, either directly or via
autopkgtest. These should pave the way towards launching our CI infra.


I've just uploaded a new package rocm-dev-tools to [apt.r.d.n] unstable,
salsa project is here [1].

This currently builds two packages:
  * rocm-qemu-support    Utilities for working with ROCm inside QEMU VMs
  * rocm-podman-support            ... inside rootless podman containers

These packages facilitate working with GPUs in environments isolated
from the host system. This "solves" point (2) listed in my strategy RFC
[2], and thus is an important step towards the goal of eventually
running on the official infra.

I had a prototype of our CI infra up and running a while ago, but then
faced the big hurdle of how to enable others to add workers.

These workers need to be configured for isolated GPU use, they need to
be able to build and/or work with images, and they need specialized
autopkgtest backends.

I've previously shared notes on to do this over the wiki, but the
process was cumbersome and error-prone, so was concerned that these
presented too much of a barrier to entry.

So the utilities below take care of all of this.


Binary Packages
===============

Each package provides three utilities and one autopkgtest backend:

rocm-qemu-support
~~~~~~~~~~~~~~~~~

  * rocm-qemu-setup    Prepare the system for GPU pass-through
  * rocm-qemu-create   Build QEMU images with GPU pass-through support
  * rocm-qemu-run      Boot a VM with GPU pass-through

  The autopkgtest-virt-qemu+rocm backend extends the regular backend
  with automatic setup of GPU pass-through.

rocm-podman-support
~~~~~~~~~~~~~~~~~~~
  * rocm-podman-setup    Prepare the system for GPU-in-container use
  * rocm-podman-create   Build podman images with GPU support
  * rocm-podman-run      Run a command in a container with GPU support

  The autopkgtest-virt-qemu+podman backend extends the regular backend
  with automatic setup devices and subuid mappings.


Example use
===========

# Run setup to determine what changes your system needs
$ rocm-qemu-setup

# Alternatively, if you want to run as a user
$ rocm-qemu-setup -u somename


# Create default image for unstable
$ sudo rocm-qemu-create /var/tmp/unstable.img

# Live above, but include an SSH key
$ sudo rocm-qemu-create -a /path/to/authorized_keys

# Create image for bookworm, using a fast local mirror (eg APT cache)
$ sudo rocm-qemu-create -r bookworm -m http://10.1.2.3:9999/debian

# If you want a regular user to run the image
$ sudo chown <user>: /var/tmp/unstable.img


# Run rocrand's autopkgtests (from the official archive) in that image
# (Use --cpus and --ram-size where needed)
$ autopkgtest -B rocrand -- qemu+rocm /var/tmp/unstable.img


# Boot into the image if you need to, here with a shared directory
$ rocm-qemu-run -d /dir/on/host /var/tmp/unstable.img


Next Steps
==========

Please, do share feedback to these utilities -- not just bugs, but also
anything that you can see worth improving. These utilities are
prototypes built on the path of least resistance (shell scripts) with
the intent to gather experience that will aid in the design of an
eventual "proper" library.

I'm now working on prepping the CI infra launch, for which these
utilities were a precondition. I don't think I will be able to complete
this by Friday, so that will probably happen early next week.

Best,
Christian

[1] https://salsa.debian.org/rocm-team/rocm-dev-tools

[2] https://lists.debian.org/debian-ai/2023/03/msg00038.html


Reply to: