Hi folks,
I was playing around with llama-cpp yesterday. I thought I'd
share instructions for building and running with AMD GPU
acceleration on Debian Testing and Unstable (or Ubuntu 23.10). I
believe this should work on most discrete AMD GPUs with sufficient
VRAM released between 2017 and 2022, specifically Vega, RDNA 1,
RDNA 2, CDNA 1 and CDNA 2 GPUs:
apt -y update
apt -y upgrade
apt -y install git wget hipcc libhipblas-dev librocblas-dev cmake build-essential
wget https://huggingface.co/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/resolve/main/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf?download=true -O dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
git checkout b2110
CC=clang-15 CXX=clang++-15 cmake -H. -Bbuild -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030"
-DCMAKE_BUILD_TYPE=Release
make -j16 -C build
build/bin/main -ngl 32 --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -m ../dolphin-2.2.1-mistral-7b.Q5_K_M.gguf --prompt "Once upon a time"
I'm not familiar enough with llama-cpp to file an RFS yet. Still,
I thought this was interesting and wanted to share.
Sincerely,
Cory Bloor