Re: bi weekly update

To: 千代航平 <kouhei.sendai@gmail.com>
Cc: debian-ai@lists.debian.org, "M. Zhou" <cdluminate@riseup.net>
Subject: Re: bi weekly update
From: Christian Kastner <ckk@kvr.at>
Date: Sun, 13 Jul 2025 19:29:45 +0200
Message-id: <[🔎] 37e3a235593e3636bb8a38cc6ec85a37@kvr.at>
In-reply-to: <[🔎] CAMN-Zwk6XHV0a-ex6_aDTUB11HJqV6eAyQuqwQ6uvgKSiziMOw@mail.gmail.com>
References: <[🔎] CAMN-Zwk6XHV0a-ex6_aDTUB11HJqV6eAyQuqwQ6uvgKSiziMOw@mail.gmail.com>

(Re-sending with debian-ai in CC... Sorry Kohei)

Hi Kohei,

On 2025-07-13 16:15, 千代航平 wrote:
> Sorry for a bit delay.
> 
> Now, the big progress is I could build vllm with cpu version and it
> works!
> I'm really happy about this.

That is fantastic news!

> Of course, my next goal is gpu version.
> However, I struggle with ray and xformers.
> First, ray is build with bazel. However, it has strict version
> dependencies with bazel it self, and it download source code from the
> Internet. Copy the code by hand might not be a big issue, but the
> Bazel version is critical for me. I'm struggling with this problem
> almost all week....

I haven't built anything with bazel yet, so I'll let others chime in...
I know that Mo was one of the original bazel wranglers.

> Also, xformers depend on flash-attention.
> https://github.com/Dao-AILab/flash-attention/tree/3ba6f826b199ff68aa9e9139a46280160defa5cd.
> I think I need to build flash-attn first, but is it right? I have
> experience using flash-attn and I know it depends on torch and cuda
> versions.

>From a cursory glance at setup.py, I think flash-attention as a
dependency might be skippable. I found a variable
XFORMERS_DISABLE_FLASH_ATTN that does indeed disable a lot of things,
but I'm not sure if that is sufficient. You could give it a try.

> Thanks for your help, I could upload some packages and will upload
> more.
> we discussed the issues
> https://salsa.debian.org/k1000dai/gsoc-status/-/issues and next
> package might be ndarray. ndarray depends on
> https://packages.debian.org/search?keywords=librust-portable-atomic-util-dev
> and rust team upload this and accepted to experimental. I think it is
> ready to upload.
> Also, I will send the MR in llama.cpp for packaging gguf-py. 

I already pushed a change to this effect [1] recently, but would be
happy to include any improvements upon this that you might have.

I'm currently finalizing the split of llama.cpp into other sub-packages,
and will upload to experimental soon.

Best,
Christian

[1]:
https://salsa.debian.org/deeplearning-team/llama.cpp/-/commit/5996cc9ccc054826f9c799ca9d3412a7a3775cb6

Reply to:

References:
- bi weekly update
  - From: 千代航平 <kouhei.sendai@gmail.com>

Prev by Date: bi weekly update
Next by Date: Re: bi weekly update
Previous by thread: bi weekly update
Next by thread: Re: bi weekly update
Index(es):
- Date
- Thread