[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ci.rocm.debian.net: gfx1100, gfx1101 broken with (possibly) 6.12.11



Hi Christian,

On 2025-04-24 02:04, Christian Kastner wrote:
I have, however, put in a request for a few Radeon PRO V710 (gfx1101)
samples. I have plenty of PCIe slots, so I'm hoping that those will work
well with PCIe pass-through. My guess will be that I will have some
gfx1101 workers in June or July.
I'm afraid this is unlikely, unless someone finds the root cause (e.g.
by bisecting 6.1.129-6.1.133).

If you recall, we initially had troubles getting gfx110x going, and
Brian DeRocher bisected the issue which pointed to AGP, which led to
this illuminating thread [1] where it was mentioned that memory mapping
for gfx110x needed a workaround for a hardware bug.

Maybe, maybe not. The quirks of PCIe passthrough behaviour may be board-specific. They often relate to VBIOS or firmware, because they're so often related to initialization (or reinitialization). The VBIOS, at least, can often vary by part number [1].

Unlike the Radeon PRO W7700, the Radeon PRO V710 was designed to support virtualized workloads. They're both gfx1101 chips, but the boards are different. That's no guarantee of anything, but I'm hoping they'll be less finicky for this kind of setup.

Sincerely,
Cory Bloor

[1]: A product name like "MI100" in fact refers to a collection of ever so slightly different parts. Having acquired a number of MI25 GPUs off ebay, I can tell you that they don't all have the same appearance, power connectors, or default power limits. Those may very well have been customized for each OEM, or even for individual server products. They will all run the same code, but passthrough takes unsupported paths through the initialization of the card and could definitely highlight differences between parts that might otherwise seem functionally identical.


Reply to: