Re: Want some advice !

To: 千代航平 <kouhei.sendai@gmail.com>
Cc: "M. Zhou" <cdluminate@riseup.net>, debian-ai@lists.debian.org
Subject: Re: Want some advice !
From: Shengqi Chen <harry@debian.org>
Date: Sat, 9 Aug 2025 15:28:14 +0800
Message-id: <[🔎] C6732360-1895-4EC8-BD84-874E389F1B7C@debian.org>
In-reply-to: <[🔎] f02b6f8dc56222fa829e946eb30792623496e1be.camel@riseup.net>
References: <[🔎] CAMN-Zwnj_ZbvLU0sHbRg6gxdjSuk3ZOsme8aeeL=AscP5iBKzg@mail.gmail.com> <[🔎] f02b6f8dc56222fa829e946eb30792623496e1be.camel@riseup.net>

Hi Kohei,

It’s great to see your progress!
I have some addition comments to Mo’s:

> 2025年8月9日 07:43，M. Zhou <cdluminate@riseup.net> 写道：
> 
> Hi Kohei,
> 
> Thanks for the updates! See my comments below:
> 
> On Fri, 2025-08-08 at 21:32 +0900, 千代航平 wrote:
>> HI, I'm tackling gpu version vllm and find some difficult part.
>> 
>> I read and tried to understand https://github.com/vllm-project/vllm vllm build system and it uses two external projects.( flash mla and vllm-flash-attn).
>> These two dependencies are cloned from the internet during the build in this part 
>> 
>> .cmake/external_projects/vllm_flash_attn.cmake
>> 
>> ```FetchContent_Declare(
>>          vllm-flash-attn
>>          GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
>>          GIT_TAG dc9d410b3e2d6534a4c70724c2515f4def670a22
>>          GIT_PROGRESS TRUE
>>          # Don't share the vllm-flash-attn build between build types
>>          BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn
>>  )
>> ```
> 
> Generally we have to patch all "fetchcontent" stuff from cmake to make sure
> nothing will be downloaded from the internet. This is required by Debian policy.
> 
> Pytorch has done many patching work for stripping these downloads and third party directories.
> It's solution is messy:
> https://salsa.debian.org/deeplearning-team/pytorch/-/blob/master/debian/patches/1020-dirtyhack.patch?ref_type=heads
> 
> For a cleanly organized example, please see onnxruntime, e.g.,
> https://salsa.debian.org/deeplearning-team/onnxruntime/-/blob/master/debian/patches/system-benchmark.patch?ref_type=heads

Just a comment — patching out FetchContent_MakeAvailable is indeed sufficient.
Leaving FetchContent_Declare is typically harmless.

Also, now Debian has passed -DFETCHCONTENT_FULLY_DISCONNECTED=ON to cmake in
dh_auto_configure (if you are using the correct dh helper). So leaving them
as-is should also be safe now — nonetheless you can always patch that (which
might lead to some rebasing work when importing new upstream releases).

[1]: https://salsa.debian.org/debian/debhelper/-/blob/main/lib/Debian/Debhelper/Buildsystem/cmake.pm?ref_type=heads

> 
>> First, flash mla is " Only build FlashMLA kernels if we are building for something compatible with" so we can skip the first.
> 
> This sentence is vague. By default, is the flash mla used for vLLM? (especially for their
> official binary release) What does "something compatible with" mean?

I think it means “when building for Hopper GPUs”? Then we might skip it for now.

>> Second, vllm-flash-attn, it is a bit weird. https://github.com/vllm-project/flash-attention.git
>> I successfully built the original version of flash-attn (https://github.com/Dao-AILab/flash-attention) , however, the vllm-flash-attn is a forked version and includes some files for vllm. (such as vllm_flash_attn
>> directory).
> 
> Generally, when cmake downloads vllm-flash-attn, we should seprately package it and patch
> cmake to use the system copy. However, since vllm-flash-attn is a fork, and it is very
> likely only vllm is going to use this fork. The simpliest reasonable way to handle
> vllm-flash-attn is an embedded source copy like this:
> 
> https://salsa.debian.org/deeplearning-team/pytorch/-/tree/master/debian/pocketfft-9d3ab05a7fffbc71a492bc6a17be034e83e8f0fe?ref_type=heads
> 
> After importing this embedded copy, make sure to update the d/copyright for it, and
> patch cmake to redirect the download to this embedded source code copy.
> 
>> In this case, how can I treat these files.  Do I need to package flash-attn and vllm-flash-attn separately? 
> 
> vllm-flash-attn can be embedded. The original version of flash-attn is not used by
> vllm, we don't have to touch it.

Thanks,
Shengqi Chen

Reply to:

Follow-Ups:
- Re: Want some advice !
  - From: 千代航平 <kouhei.sendai@gmail.com>

References:
- Want some advice !
  - From: 千代航平 <kouhei.sendai@gmail.com>
- Re: Want some advice !
  - From: "M. Zhou" <cdluminate@riseup.net>

Prev by Date: Re: Want some advice !
Next by Date: Processing of pkg-rocm-tools_0.9.0~exp3_source.changes
Previous by thread: Re: Want some advice !
Next by thread: Re: Want some advice !
Index(es):
- Date
- Thread