[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++



Package: wnpp
Severity: wishlist
Owner: Christian Kastner <ckk@debian.org>
X-Debbugs-Cc: debian-devel@lists.debian.org, debian-ai@lists.debian.org

* Package name    : llama.cpp
  Version         : b2116
  Upstream Author : Georgi Gerganov
* URL             : https://github.com/ggerganov/llama.cpp
* License         : MIT
  Programming Lang: C++
  Description     : Inference of Meta's LLaMA model (and others) in pure C/C++

The main goal of llama.cpp is to enable LLM inference with minimal
setup and state-of-the-art performance on a wide variety of hardware -
locally and in the cloud.

* Plain C/C++ implementation without any dependencies
* Apple silicon is a first-class citizen - optimized via ARM NEON,
  Accelerate and Metal frameworks
* AVX, AVX2 and AVX512 support for x86 architectures
* 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for
  faster inference and reduced memory use
* Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD
  GPUs via HIP)
* Vulkan, SYCL, and (partial) OpenCL backend support
* CPU+GPU hybrid inference to partially accelerate models larger than
  the total VRAM capacity

This package will be maintained by the Debian Deep Learning Team.


Reply to: