[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Building llama.cpp for AMD GPU using only Debian packages?



On Sat, 1 Feb 2025 at 00:47, Cordell Bloor <cgmb@slerp.xyz> wrote:
> What models should we be tuning for?

The tuning would be for an architecture, not a specific model, right?

afaics the most used model architectures are:

- Qwen2ForCausalLM (Qwen 2.5, Qwen 2.5 Coder)
- LlamaForCausalLM (Llama 3.0, 3.1, 3.2)
- MistralForCausalLM (Mistral, Ministral, etc)

These models are popular for local inference due to their realistic
RAM requirements (1 to 4 gaming GPUs or old cheap datacentre GPUs).
Also most organisations and individuals doing finetunes are using
these models as a base for their improvements.

Of course DeepSeek R1 has become popular in the last month but is not
often run locally due to the high memory requirement (671B
parameters).

> > AMD's marketing for RDNA3 (7900XTX) also uses Vulkan
> I wouldn't read too much into that. That does suggest that the Vulkan
> implementation was faster, but we don't know if that's a well-optimized
> result. I suspect it's not.

On the Reddit thread where this was discussed, someone tested ROCm vs
Vulkan on their 7900 XT or XTX (I forget) and found a few % token
generation advantage in Vulkan. Unfortunately they've since deleted
their comment.

Jamie


Reply to: