Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Package: wnpp
Severity: wishlist
Owner: Christian Kastner <ckk@debian.org>
X-Debbugs-Cc: debian-devel@lists.debian.org, debian-ai@lists.debian.org
* Package name : llama.cpp
Version : b2116
Upstream Author : Georgi Gerganov
* URL : https://github.com/ggerganov/llama.cpp
* License : MIT
Programming Lang: C++
Description : Inference of Meta's LLaMA model (and others) in pure C/C++
The main goal of llama.cpp is to enable LLM inference with minimal
setup and state-of-the-art performance on a wide variety of hardware -
locally and in the cloud.
* Plain C/C++ implementation without any dependencies
* Apple silicon is a first-class citizen - optimized via ARM NEON,
Accelerate and Metal frameworks
* AVX, AVX2 and AVX512 support for x86 architectures
* 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for
faster inference and reduced memory use
* Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD
GPUs via HIP)
* Vulkan, SYCL, and (partial) OpenCL backend support
* CPU+GPU hybrid inference to partially accelerate models larger than
the total VRAM capacity
This package will be maintained by the Debian Deep Learning Team.
Reply to: