gpu-isolation-tools 0.1 uploaded: Initial release
Hi,
I released src:gpu-isolation-tools to the APT repo [1].
This builds gpuisol-podman and gpuisol-qemu packages, which include
scripts for building and running containers resp. VM images with AMD or
NVIDIA GPU support.
These packages also ship autopkgtest backends which make use of the
above containers / VM images:
* qemu+rocm, podman+rocm
* qemu+cuda, podman+cuda
Please see the README [1] for examples on how to use all of this.
Background
==========
This is an evolution of the rocm-podman-support and rocm-qemu-support
packages shipped by src:pkg-rocm-tools. At DebConf, I was asked a few
times if these needed to be GPU-specific, which they didn't.
The gpuisol-* packages provide a strict superset of the functionality of
rocm-*-support, so they have Breaks/Replaces on them.
These also include a number of bugfixes and improvements over
rocm-*-support; for example, gpuisol-qemu-create no longer needs root
to build VM images.
Features and Limitations
========================
Currently, AMD and NVIDIA GPUs are supported. With QEMU, individual GPUs
can be selected but multi-GPU pass-through has not yet been tested. With
podman, all devices are currently passed through.
For NVIDIA GPUs in containers, some of the container device-level
libraries need to match the host kernel to a degree. However NVIDIA
ships a fully open source [2] container toolkit, with both docker and
podman support, that seems to solve all container-related problems,
including passing in individual devices on a host with multiple ones.
Please feel free to suggest other features.
Outlook
=======
Once these utilities have received adequate usage and also the naming
problem has been solved, I will upload these to the Debian Archive.
I will work on feeding the GPU-specific support of the autopkgtest
backends (autopkgtest-virt-*) back into the upstream 'qemu' and 'podman'
backends. It's not a huge amount of code, but it has conceptual
complexity, and it needs to be broken down into digestible chunks, and
the interfaces need to be sound.
It should be easy to add other GPU / NPU manufacturers to these
utilities, though I'd need access to hardware to test that.
Naming
======
As always, the biggest pain. The current names are odd, and not entirely
accurate (not every accelerator is a GPU).
The names I settled on are names I could live with, but if anyone can
think of a superior solution, please let me know.
Best,
Christian
[1]: https://apt.ai.debian.net/
[2]: https://github.com/NVIDIA/nvidia-container-toolkit
Reply to: