Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

To: debian-ai@lists.debian.org
Subject: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
From: Christian Kastner <ckk@debian.org>
Date: Thu, 6 Feb 2025 23:42:43 +0100
Message-id: <[🔎] 3bb8fa34-da8c-466a-b6e6-efdbbaa519f5@debian.org>
In-reply-to: <[🔎] f5b3cee6bf0c867807fc06c93e83c3f835316b45.camel@debian.org>
References: <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <sa65xxw4jhn.fsf@hjemme.reinholdtsen.name> <22d3e2d2-cfbd-431d-9211-e902ac3dfe4b@debian.org> <22d3e2d2-cfbd-431d-9211-e902ac3dfe4b@debian.org> <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <de29a469-6c9b-4025-bbed-988e10dc5a38@slerp.xyz> <0aa4f182-da25-4ba5-8d9f-a1d1f8ad9221@debian.org> <ece647c1-3dba-4737-a215-c93112990fe4@debian.org> <7976e018-a547-4bba-82ba-13847980356e@debian.org> <efae84f0-dfa9-4cd2-a869-752ae1bd22cd@debian.org> <[🔎] c9735f7c-982a-4e81-a048-bc588833dccf@debian.org> <[🔎] 2f158aa02fac5d00dcdcfc8a6ce0ee2a147bc3c0.camel@debian.org> <[🔎] sa6ldukhzw2.fsf@hjemme.reinholdtsen.name> <[🔎] 9d8ea37e-310e-4a61-83c2-b8820a17f016@debian.org> <[🔎] sa67c63j4iv.fsf@hjemme.reinholdtsen.name> <[🔎] 35dc70e806d5a4c273a385b1b02770b8550e1b55.camel@debian.org> <[🔎] af0ba2ed-ee57-4c54-b544-fa567884f7a8@debian.org> <[🔎] f5b3cee6bf0c867807fc06c93e83c3f835316b45.camel@debian.org>

On 2025-02-06 15:14, M. Zhou wrote:
> For ppc64el, the llama.cpp-blas backend is way slower than the -cpu backend.

Actually I observed the same on amd64.

But only in my build, because I deactivate all feature flags in the CPU
backend, so -cpu is at our amd64 baseline.
> 
> CPU is slow anyway. How does HIP performs?

It massively outperforms CPU, in line with Cory's observation [1].

Best,
Christian

[1]: https://lists.debian.org/debian-ai/2025/01/msg00136.html

Reply to:

References:
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Christian Kastner <ckk@debian.org>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: "M. Zhou" <lumin@debian.org>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Petter Reinholdtsen <pere@hungry.com>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Christian Kastner <ckk@debian.org>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Petter Reinholdtsen <pere@hungry.com>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: "M. Zhou" <lumin@debian.org>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Christian Kastner <ckk@debian.org>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Bug#1094326: pytorch-cuda: rebuild for libbenchmark transition required
Next by Date: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Previous by thread: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Next by thread: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Index(es):
- Date
- Thread