[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++



On 2025-01-27 15:37, M. Zhou wrote:
> On Mon, 2025-01-27 at 11:13 +0100, Christian Kastner wrote:
> BLAS itself only handles float32, float64, complex float32, and complex
> float64 datatypes, which are typically "s", "d", "c", "z" prefixes in the
> API. The quantized neural networks are not likely running in float point
> mode, but int mode like int4 and int8.

Ah, good to know.

> You may take libtorch2.5 as a reference, while building against
> libblas-dev, we may manually recommend high-performance BLAS
> implementations when the user installs:
> 
> Recommends: libopenblas0 | libblis4 | libmkl-rt | libblas3

I'll use that. Thanks!

Best,
Christian


Reply to: