[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFS: rccl/5.4.3-3~exp1 -- ROCm Communication Collectives Library



Hi Christian,

On 2024-03-26 12:42, Christian Kastner wrote:
I assume this is still up-to-date? (I added a d/gbp.conf.)
Yes.
Slightly tangential: What do you think about setting up a specific
worker configuration for multi-GPU tests, for example configuring
pinwheel as
   * amd64+gfx90a when one GPU is in use
   * amd64+gfx90a_x2 (or similar) when both GPUs are in use?

pinwheel/gfx90a is just one example, other configuration would of course
also work.

I'm not sure. I thought amd64 and gfx90a were the ISAs, but x2 is more a question of system configuration. I may have misunderstand the meaning.

A related topic is that AMD is no longer following the 1:1 mapping between ISA and architecture that spawned the identical gfx103{0,1,2,3,4,5,6} ISAs. The recent Mendocino chips (Radeon 610M) report themselves to the driver as gfx1037 for the gfxip, but clang developers chose to reuse the gfx1036 ISA rather than creating yet another identical gfx103x ISA. That used to happen more often. The gfx803 ISA was used by many different GPUs. For example, the MI6 and MI8 were Ellesmere and Fiji, respectively, but both were gfx803.

I don't really have any strong opinions about how the CI should handle some of these more complex hardware requirements. Your suggestion seems reasonable, although I'm not sure we want to add an amd64+gfx90a_x2 row to each package status page. I think we could get away with our current configuration for a while, if we want to spend more time thinking this through. Argo currently has four gfx803 GPUs in the container when it runs the autopkgtests and it's currently working on the amd64+gfx803 queue.

Sincerely,
Cory Bloor


Reply to: