Re: introduction

To: Brian DeRocher <brian@derocher.org>, Debian ROCm Team <debian-ai@lists.debian.org>
Subject: Re: introduction
From: Christian Kastner <ckk@debian.org>
Date: Fri, 19 Jul 2024 20:02:09 +0200
Message-id: <[🔎] b2da5a57-a321-4b23-9cff-80cd0edf1c8b@debian.org>
In-reply-to: <[🔎] 22f57033-a5b1-4c8a-92e4-b2ff0e06f72d@derocher.org>
References: <[🔎] 22f57033-a5b1-4c8a-92e4-b2ff0e06f72d@derocher.org>

Hi Brian,

On 2024-07-19 01:37, Brian DeRocher wrote:
> I'd like to introduce myself.  And try to help out.

Welcome, and thanks for taking an interest!

>   * mobo: Tyan S805
>   * cpu: Epyc 9354P
>   * gpu: Gigabyte RX 7900 GRE

Right up front: I did not yet succeed in passing through a gfx1100 device.

The last time I tried was in April. It is very well possible that newer
kernels and firmware may have fixed issues.

Also, the mainboard presents a configuration that I do not yet have
experience with.

> I'd like to keep Debian stable on my host, and pass the GPU into VMs as
> needed for data science projects.

I also run stable, but tend to use the kernel from bookworm-backports as
they're usually pretty recent, and might contain relevant bugfixes.

bookworm-backports also as a newer QEMU but I'm not sure it would make
much of a difference on the host. Can't hurt trying, though.

In the guest, with the 7900 I'd definitely first try to get things
running with the unstable distribution (newest kernel, newest firmware,
newest ROCm packages).

> As I get this working, I'd like to share my notes and test results. 
> Through Level1Techs, Proxmox, and Unraid, there's a lot of confusion out
> there.  Though this proxmox page [1] is pretty good.  Arch pages are
> pretty good too.
> 
> [1] https://pve.proxmox.com/wiki/PCI(e)_Passthrough

Yes, those pages (plus Gentoo) are excellent, and helped me with my own
experiments.

> I've tried to follow instructions here [2], but no success yet.  There
> are a couple typos here I'd like to fix.
> 
> [2] https://salsa.debian.org/rocm-team/community/team-project/-/wikis/qemu-with-gpu-pass-through>
> When I identify the devices to pass through, and subsequently find other
> devices in the same IOMMU group, I find these (aside from the sound card):
> 
>   * c2:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI]
>     Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 10)
>   * c1:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI]
>     Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 10)
>   * c0:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>     Device [1022:149f] (rev 01)
> 
> Should these be passed through too?

TTBOMK, yes. (I think VFIO even warns when passing in a device without
adding all other devices of an IOMMU group).

Herein lies the first major obstacle: when passing in those devices, how
does one configure them in the guest.

On the host, take a look at `lspci -t`. In our scripts, I tried to
replicate the same structure in the guest. (IIRC it was even necessary
as otherwise interrupts didn't get rooted correctly, but it's been a
year since my deep-dive).

However I never encountered devices in an IOMMU group other than GPU and
audio, so I cannot really say if this is necessary or not.

If the devices above and the GPUs are not on the same branch, there's a
good chance that it is sufficient to pass in the devices, and just
ignore them in the guest.

> My other question for now is, my host is Debian Bookworm running
> 6.1.0-22-amd64.  Is this too old?

Should be fine, but see above.

> In my guest OS, based on qemu-rocm-build, I was seeing that 2 firmware
> files were not being found.
> 
>   * [    5.517007] amdgpu 0000:01:00.0: firmware: failed to load
>     amdgpu/gc_11_0_0_rlc_1.bin (-2)
>   * [    5.533495] amdgpu 0000:01:00.0: firmware: failed to load
>     amdgpu/gc_11_0_0_mes_2.bin (-2)
> 
> So I copied them from here [3], the whole folder, not just the 2 above.
> 
> [3]
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/

Incidentally, the firmware-amd-graphcis package was recently updated to
the newest firmware [1], hence my suggestion to try unstable first.
> But this results in a bad crash.
> 
> [----------] 42 tests from rocrand_basic_tests/rocrand_basic_tests
> [ RUN      ] rocrand_basic_tests/rocrand_basic_tests.rocrand_create_destroy_generator_test/0
> [       OK ] rocrand_basic_tests/rocrand_basic_tests.rocrand_create_destroy_generator_test/0 (0 ms)
> [ RUN      ] rocrand_basic_tests/rocrand_basic_tests.rocrand_create_destroy_generator_test/1
> error: kvm run failed Bad address
> RAX=0000000000003398 RBX=0000000000000674 RCX=00030001070de073 RDX=0000000000000673
> RSI=ff623407f0003398 RDI=ff3ceee645b00000 RBP=ff3ceee64717d4e0 RSP=ff623407c066b718
> R8 =0003000000000073 R9 =ff623407f0000000 R10=ff3ceee645b0faf8 R11=ff3ceee6463d04b8
> R12=ff3ceee645b00000 R13=0003000000000073 R14=ff623407f0000000 R15=0000000000000674
> RIP=ffffffffc1159a13 RFL=00000282 [--S----] CPL=0 II=0 A20=1 SMM=0 HLT=0

This was the exact failure mode that I saw in April (kvm run failed Bad
address).

The test failing (...generator_test/1) is the first test that accesses
GPU memory IIRC.

I had to stop at that point, for lack of time.

> So, what's my next step and how can I help?

Help in solving this would be massively appreciated. In our CI [2], we
currently isolate gfx1100 in podman containers, but being able to use
QEMU with pass-through would be a huge win.

To be honest, I found the setup I documented in the wiki above simply
through lots of trial an error. For example, for ROCm use, I found that
I had to pass x-vga=off to QEMU, contrary to what the guides said (which
were for graphical output). And the PCI bridges in the guest needed a
specific setup.

I see at least three obstacles, though:
  (1) The IOMMU group thing
  (2) The 7900 which should work in theory, but hasn't been shown yet
  (3) Possibly outdated ROCm libraries triggering some odd thing
      (Though this one is a really far stretch of imagination)

Normally I'd tackle these independently, first resolving (1) by using a
known-good card and when the IOMMU setup is confirmed working, tackle
(2). Known-good cards on my end are:
  * gfx1030 (6800/6900 XT)
  * gfx1032 (6600/6650 XT)
  * gfx1034 (6500 XT)
If you don't have any of these cards or cannot borrow one, I think I got
the gfx1034 for EUR100 (~USD110) on my local auction site (similar to eBay).

Alternatively, since you managed to get as far as I did with the error
above, it wouldn't be too wild to assume that you've already solved (1).

I myself am going to take another look at this mid-August, so I'm happy
to collaborate.

Best,
Christian

[1]: https://packages.debian.org/sid/firmware-amd-graphics
[2]: https://ci.rocm.debian.net

Reply to:

Follow-Ups:
- Re: introduction
  - From: Brian DeRocher <brian@derocher.org>

References:
- introduction
  - From: Brian DeRocher <brian@derocher.org>

Prev by Date: Processed: block 1070446 by 1076578
Next by Date: Re: introduction
Previous by thread: introduction
Next by thread: Re: introduction
Index(es):
- Date
- Thread