Re: Building llama.cpp for AMD GPU using only Debian packages?

To: Cordell Bloor <cgmb@slerp.xyz>
Cc: Petter Reinholdtsen <pere@hungry.com>, debian-ai@lists.debian.org
Subject: Re: Building llama.cpp for AMD GPU using only Debian packages?
From: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Date: Sat, 1 Feb 2025 09:48:13 +1000
Message-id: <[🔎] CAAvyFNg1=JCx8vg3xuyG9VojY6Lam_wdh5d9VpN0ejW5h=VT3A@mail.gmail.com>
In-reply-to: <[🔎] 2a0bb4a5-bec1-4abe-b10a-3905afca8a09@slerp.xyz>
References: <sa6sewtv5km.fsf@hjemme.reinholdtsen.name> <a2e3c24b-6459-41cb-995f-93bb4bdfee97@slerp.xyz> <sa6wmm5s28m.fsf@hjemme.reinholdtsen.name> <sa6r0cds1br.fsf@hjemme.reinholdtsen.name> <af3fa2a8-63f0-44ec-bb0e-5105a3235f6b@slerp.xyz> <a6840953-4656-47ed-96f7-ee7fe2f5b247@slerp.xyz> <sa6tth8p2rk.fsf@hjemme.reinholdtsen.name> <sa6a5izpnvp.fsf@hjemme.reinholdtsen.name> <sa64j96q4gg.fsf@hjemme.reinholdtsen.name> <[🔎] sa6sep3dh5h.fsf@hjemme.reinholdtsen.name> <[🔎] CAAvyFNjufDunbeV42O6Or4ty7bxXerbwOF_0STbvXSVkqus95w@mail.gmail.com> <[🔎] 2a0bb4a5-bec1-4abe-b10a-3905afca8a09@slerp.xyz>

On Sat, 1 Feb 2025 at 00:47, Cordell Bloor <cgmb@slerp.xyz> wrote:
> What models should we be tuning for?

The tuning would be for an architecture, not a specific model, right?

afaics the most used model architectures are:

- Qwen2ForCausalLM (Qwen 2.5, Qwen 2.5 Coder)
- LlamaForCausalLM (Llama 3.0, 3.1, 3.2)
- MistralForCausalLM (Mistral, Ministral, etc)

These models are popular for local inference due to their realistic
RAM requirements (1 to 4 gaming GPUs or old cheap datacentre GPUs).
Also most organisations and individuals doing finetunes are using
these models as a base for their improvements.

Of course DeepSeek R1 has become popular in the last month but is not
often run locally due to the high memory requirement (671B
parameters).

> > AMD's marketing for RDNA3 (7900XTX) also uses Vulkan
> I wouldn't read too much into that. That does suggest that the Vulkan
> implementation was faster, but we don't know if that's a well-optimized
> result. I suspect it's not.

On the Reddit thread where this was discussed, someone tested ROCm vs
Vulkan on their 7900 XT or XTX (I forget) and found a few % token
generation advantage in Vulkan. Unfortunately they've since deleted
their comment.

Jamie

Reply to:

References:
- Re: Building llama.cpp for AMD GPU using only Debian packages?
  - From: Petter Reinholdtsen <pere@hungry.com>
- Re: Building llama.cpp for AMD GPU using only Debian packages?
  - From: Jamie Bainbridge <jamie.bainbridge@gmail.com>
- Re: Building llama.cpp for AMD GPU using only Debian packages?
  - From: Cordell Bloor <cgmb@slerp.xyz>

Prev by Date: Re: Bug#1094806: ITP: ollama -- large language model tools
Previous by thread: Re: Building llama.cpp for AMD GPU using only Debian packages?
Next by thread: pytorch-cuda is marked for autoremoval from testing
Index(es):
- Date
- Thread