Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
From: Christian Kastner <ckk@debian.org>
Date: Sat, 10 Feb 2024 22:50:30 +0100
Message-id: <[🔎] d373f55c-2869-490b-aeaf-0fba8c10c02e@debian.org>
Reply-to: Christian Kastner <ckk@debian.org>, 1063673@bugs.debian.org

Package: wnpp
Severity: wishlist
Owner: Christian Kastner <ckk@debian.org>
X-Debbugs-Cc: debian-devel@lists.debian.org, debian-ai@lists.debian.org

* Package name    : llama.cpp
  Version         : b2116
  Upstream Author : Georgi Gerganov
* URL             : https://github.com/ggerganov/llama.cpp
* License         : MIT
  Programming Lang: C++
  Description     : Inference of Meta's LLaMA model (and others) in pure C/C++

The main goal of llama.cpp is to enable LLM inference with minimal
setup and state-of-the-art performance on a wide variety of hardware -
locally and in the cloud.

* Plain C/C++ implementation without any dependencies
* Apple silicon is a first-class citizen - optimized via ARM NEON,
  Accelerate and Metal frameworks
* AVX, AVX2 and AVX512 support for x86 architectures
* 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for
  faster inference and reduced memory use
* Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD
  GPUs via HIP)
* Vulkan, SYCL, and (partial) OpenCL backend support
* CPU+GPU hybrid inference to partially accelerate models larger than
  the total VRAM capacity

This package will be maintained by the Debian Deep Learning Team.

Reply to:

Prev by Date: Re: Confusion over t64 migration
Next by Date: Bug#1063687: ITP: sploitscan -- Search for CVE information
Previous by thread: Bug#1063633: ITP: python-autodocsumm -- API that automatically extends sphinx
Next by thread: Bug#1063687: ITP: sploitscan -- Search for CVE information
Index(es):
- Date
- Thread