Re: Building llama.cpp for AMD GPU using only Debian packages?
On Sat, 1 Feb 2025 at 00:47, Cordell Bloor <cgmb@slerp.xyz> wrote:
> What models should we be tuning for?
The tuning would be for an architecture, not a specific model, right?
afaics the most used model architectures are:
- Qwen2ForCausalLM (Qwen 2.5, Qwen 2.5 Coder)
- LlamaForCausalLM (Llama 3.0, 3.1, 3.2)
- MistralForCausalLM (Mistral, Ministral, etc)
These models are popular for local inference due to their realistic
RAM requirements (1 to 4 gaming GPUs or old cheap datacentre GPUs).
Also most organisations and individuals doing finetunes are using
these models as a base for their improvements.
Of course DeepSeek R1 has become popular in the last month but is not
often run locally due to the high memory requirement (671B
parameters).
> > AMD's marketing for RDNA3 (7900XTX) also uses Vulkan
> I wouldn't read too much into that. That does suggest that the Vulkan
> implementation was faster, but we don't know if that's a well-optimized
> result. I suspect it's not.
On the Reddit thread where this was discussed, someone tested ROCm vs
Vulkan on their 7900 XT or XTX (I forget) and found a few % token
generation advantage in Vulkan. Unfortunately they've since deleted
their comment.
Jamie
Reply to: