Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

To: "M. Zhou" <lumin@debian.org>, Petter Reinholdtsen <pere@hungry.com>, debian-ai@lists.debian.org, 1063673@bugs.debian.org
Subject: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
From: Christian Kastner <ckk@debian.org>
Date: Thu, 6 Feb 2025 09:13:04 +0100
Message-id: <[🔎] af0ba2ed-ee57-4c54-b544-fa567884f7a8@debian.org>
Reply-to: Christian Kastner <ckk@debian.org>, 1063673@bugs.debian.org
In-reply-to: <[🔎] 35dc70e806d5a4c273a385b1b02770b8550e1b55.camel@debian.org>
References: <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <sa6mss4bytd.fsf@hjemme.reinholdtsen.name> <fdedee66-9a55-475e-9e23-acfdfc351025@debian.org> <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <sa65xxw4jhn.fsf@hjemme.reinholdtsen.name> <22d3e2d2-cfbd-431d-9211-e902ac3dfe4b@debian.org> <22d3e2d2-cfbd-431d-9211-e902ac3dfe4b@debian.org> <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <de29a469-6c9b-4025-bbed-988e10dc5a38@slerp.xyz> <0aa4f182-da25-4ba5-8d9f-a1d1f8ad9221@debian.org> <ece647c1-3dba-4737-a215-c93112990fe4@debian.org> <7976e018-a547-4bba-82ba-13847980356e@debian.org> <efae84f0-dfa9-4cd2-a869-752ae1bd22cd@debian.org> <c9735f7c-982a-4e81-a048-bc588833dccf@debian.org> <2f158aa02fac5d00dcdcfc8a6ce0ee2a147bc3c0.camel@debian.org> <[🔎] sa6ldukhzw2.fsf@hjemme.reinholdtsen.name> <[🔎] 9d8ea37e-310e-4a61-83c2-b8820a17f016@debian.org> <[🔎] sa67c63j4iv.fsf@hjemme.reinholdtsen.name> <[🔎] 35dc70e806d5a4c273a385b1b02770b8550e1b55.camel@debian.org> <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian. org>

On 2025-02-06 02:42, M. Zhou wrote:
> I second this. llama-server is also the service endpoint for DebGPT.

I'll prioritize fixing this.

> I pushed a fix for ppc64el. The hwcaps works correctly for power9, given the baseline is power 8.

Ah good catch. The broken install pattern was due to a last-minute fix
that I only tested on amd64...

I meant to ask anyway: performance-wise, is it comparable to your local
build? I mean, I wouldn't know what in the code would alter this, but I
built and tested this on platti.d.o and performance was poor, so another
data point would be useful.

Best,
Christian

Reply to:

Follow-Ups:
- Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: "M. Zhou" <lumin@debian.org>

References:
- Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Petter Reinholdtsen <pere@hungry.com>
- Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Christian Kastner <ckk@debian.org>
- Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Petter Reinholdtsen <pere@hungry.com>
- Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Next by Date: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Previous by thread: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Next by thread: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Index(es):
- Date
- Thread