Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

To: debian-ai@lists.debian.org
Subject: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
From: Mo Zhou <lumin@debian.org>
Date: Sun, 22 Dec 2024 14:42:59 -0500
Message-id: <[🔎] efae84f0-dfa9-4cd2-a869-752ae1bd22cd@debian.org>
In-reply-to: <[🔎] 7976e018-a547-4bba-82ba-13847980356e@debian.org>
References: <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <sa6mss4bytd.fsf@hjemme.reinholdtsen.name> <fdedee66-9a55-475e-9e23-acfdfc351025@debian.org> <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <sa65xxw4jhn.fsf@hjemme.reinholdtsen.name> <22d3e2d2-cfbd-431d-9211-e902ac3dfe4b@debian.org> <22d3e2d2-cfbd-431d-9211-e902ac3dfe4b@debian.org> <d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org> <[🔎] de29a469-6c9b-4025-bbed-988e10dc5a38@slerp.xyz> <[🔎] 0aa4f182-da25-4ba5-8d9f-a1d1f8ad9221@debian.org> <[🔎] ece647c1-3dba-4737-a215-c93112990fe4@debian.org> <[🔎] 7976e018-a547-4bba-82ba-13847980356e@debian.org>


On 12/22/24 13:03, Christian Kastner wrote:

To my point of view, llama.cpp is more suitable for source-based
distributions like Gentoo. In the past I proposed something similar for
Debian but the community was not interested in that.

Well, we could still keep this in mind and implement it experimentally
for some packages. After all, it's already happening in some sort with DKMS.

Given how data is processed today, our amd64 baseline really does impede
machine-learning software, so some solution will need to be found.

Apart from source-based alternative distribution for Debian, "bumping amd64
baseline for selected packages" is another project I proposed long time ago:

  https://github.com/SIMDebian/SIMDebian

Software like Eigen3, TensorFlow can heavily benefit from the baseline bump.
At that time PyTorch did not have dispatch, but now it has already.

In terms of the BLAS/MKL-like approach for SIMD capability
dispatching ... I bet focusing on something else is more worthwhile.

I probably misunderstood something about the MKL packages. I saw
   libmkl-avx
   libmkl-avx2
   libmkl-avx512
   ...
and I assumed that these are all the same library, just built with
different optimizations. But they don't seem to conflict with each
other, so I guess it's more subtle than that.

MKL might not be working the way you thought. The main library to linkagainst

is libmkl-rt, which will dynamically link to the right implementation based

on the run-time CPU capability. That's just their specific way ofimplementing

the dynamic dispatch.

My idea was to just build libllama X times with various optimizations
enabled. So users can select the version best applicable to their CPU
and/or GPU, but we would also have a non-optimized version that would
satisfy our amd64 ISA requirements.

No need to do that manually. The Glibc already provided that kind of
dispatching functionality when you build multiple solibs with different
baselines. Please check

  https://lists.debian.org/debian-devel/2019/04/msg00057.html

Or concretely,

  section "Hardware capabilities" from ld.so(8)

I believe this is what you were expecting. But it seems that the avx2/avx512
dispatch is missing from the man page. Don't know what's happening for their
support.

BTW, my personal conclusion on the SIMDebian project is that, while bumping

baseline can indeed benefit a lot, it is not necessary to do so byourselves,

because if SIMD performance is really important to this software,

(1) the users will figure out how to compile using -march=native. Typically
consider this is also done in highly-customized environments like HPC. Those
power users will anyway recompile on their own to fit their need whatever we
package.

(2) the upstream will implement them soon after being requested, if the
software remains popular while not interested in implementing so, that
means SIMD was not necessary for them.

I'm surprised those discussions were already something happened 5 years ago.

Reply to:

Follow-Ups:
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Cordell Bloor <cgmb@slerp.xyz>

References:
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Cordell Bloor <cgmb@slerp.xyz>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Christian Kastner <ckk@debian.org>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Mo Zhou <lumin@debian.org>
- Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
  - From: Christian Kastner <ckk@debian.org>

Prev by Date: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Next by Date: python-openai is marked for autoremoval from testing
Previous by thread: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Next by thread: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Index(es):
- Date
- Thread