Re: Want some advice !
Hi Kohei,
It’s great to see your progress!
I have some addition comments to Mo’s:
> 2025年8月9日 07:43,M. Zhou <cdluminate@riseup.net> 写道:
>
> Hi Kohei,
>
> Thanks for the updates! See my comments below:
>
> On Fri, 2025-08-08 at 21:32 +0900, 千代航平 wrote:
>> HI, I'm tackling gpu version vllm and find some difficult part.
>>
>> I read and tried to understand https://github.com/vllm-project/vllm vllm build system and it uses two external projects.( flash mla and vllm-flash-attn).
>> These two dependencies are cloned from the internet during the build in this part
>>
>> .cmake/external_projects/vllm_flash_attn.cmake
>>
>> ```FetchContent_Declare(
>> vllm-flash-attn
>> GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
>> GIT_TAG dc9d410b3e2d6534a4c70724c2515f4def670a22
>> GIT_PROGRESS TRUE
>> # Don't share the vllm-flash-attn build between build types
>> BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn
>> )
>> ```
>
> Generally we have to patch all "fetchcontent" stuff from cmake to make sure
> nothing will be downloaded from the internet. This is required by Debian policy.
>
> Pytorch has done many patching work for stripping these downloads and third party directories.
> It's solution is messy:
> https://salsa.debian.org/deeplearning-team/pytorch/-/blob/master/debian/patches/1020-dirtyhack.patch?ref_type=heads
>
> For a cleanly organized example, please see onnxruntime, e.g.,
> https://salsa.debian.org/deeplearning-team/onnxruntime/-/blob/master/debian/patches/system-benchmark.patch?ref_type=heads
Just a comment — patching out FetchContent_MakeAvailable is indeed sufficient.
Leaving FetchContent_Declare is typically harmless.
Also, now Debian has passed -DFETCHCONTENT_FULLY_DISCONNECTED=ON to cmake in
dh_auto_configure (if you are using the correct dh helper). So leaving them
as-is should also be safe now — nonetheless you can always patch that (which
might lead to some rebasing work when importing new upstream releases).
[1]: https://salsa.debian.org/debian/debhelper/-/blob/main/lib/Debian/Debhelper/Buildsystem/cmake.pm?ref_type=heads
>
>> First, flash mla is " Only build FlashMLA kernels if we are building for something compatible with" so we can skip the first.
>
> This sentence is vague. By default, is the flash mla used for vLLM? (especially for their
> official binary release) What does "something compatible with" mean?
I think it means “when building for Hopper GPUs”? Then we might skip it for now.
>> Second, vllm-flash-attn, it is a bit weird. https://github.com/vllm-project/flash-attention.git
>> I successfully built the original version of flash-attn (https://github.com/Dao-AILab/flash-attention) , however, the vllm-flash-attn is a forked version and includes some files for vllm. (such as vllm_flash_attn
>> directory).
>
> Generally, when cmake downloads vllm-flash-attn, we should seprately package it and patch
> cmake to use the system copy. However, since vllm-flash-attn is a fork, and it is very
> likely only vllm is going to use this fork. The simpliest reasonable way to handle
> vllm-flash-attn is an embedded source copy like this:
>
> https://salsa.debian.org/deeplearning-team/pytorch/-/tree/master/debian/pocketfft-9d3ab05a7fffbc71a492bc6a17be034e83e8f0fe?ref_type=heads
>
> After importing this embedded copy, make sure to update the d/copyright for it, and
> patch cmake to redirect the download to this embedded source code copy.
>
>> In this case, how can I treat these files. Do I need to package flash-attn and vllm-flash-attn separately?
>
> vllm-flash-attn can be embedded. The original version of flash-attn is not used by
> vllm, we don't have to touch it.
Thanks,
Shengqi Chen
Reply to: