Want some advice !

HI, I'm tackling gpu version vllm and find some difficult part.

I read and tried to understand https://github.com/vllm-project/vllm vllm build system and it uses two external projects.( flash mla and vllm-flash-attn).

These two dependencies are cloned from the internet during the build in this part

.cmake/external_projects/vllm_flash_attn.cmake

```FetchContent_Declare(

vllm-flash-attn
GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
GIT_TAG dc9d410b3e2d6534a4c70724c2515f4def670a22
GIT_PROGRESS TRUE
# Don't share the vllm-flash-attn build between build types
BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn
)

```

First, flash mla is " Only build FlashMLA kernels if we are building for something compatible with" so we can skip the first.

Second, vllm-flash-attn, it is a bit weird. https://github.com/vllm-project/flash-attention.git

I successfully built the original version of flash-attn (https://github.com/Dao-AILab/flash-attention) , however, the vllm-flash-attn is a forked version and includes some files for vllm. (such as vllm_flash_attn directory).

In this case, how can I treat these files. Do I need to package flash-attn and vllm-flash-attn separately?

------------------------------------------------------------------------------------------------------

kouhei.sendai@gmail.com

Kohei Sendai

-------------------------------------------------------------------------------------------