[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++



On Thu, 2025-02-06 at 09:13 +0100, Christian Kastner wrote:
> 
> I meant to ask anyway: performance-wise, is it comparable to your local
> build? I mean, I wouldn't know what in the code would alter this, but I
> built and tested this on platti.d.o and performance was poor, so another
> data point would be useful.

For ppc64el, the llama.cpp-blas backend is way slower than the -cpu backend.
I did not test on amd64. But on ppc64el the package does not feel different
than local build.

CPU is slow anyway. How does HIP performs?

phi-4-q4.gguf | power9, cpu (8-threads) | 0.62 tokens/s
phi-4-q4.gguf | amd64, 13900H           | 6.7 tokens/s

GPU is way faster than this. The phi-4 model does not fit in my nvidia GPU.
No number for GPU this time.


Reply to: