HI, I'm tackling gpu version vllm and find some difficult part.
These two dependencies are cloned from the internet during the build in this part
.cmake/external_projects/vllm_flash_attn.cmake
```FetchContent_Declare(
vllm-flash-attn
GIT_REPOSITORY
https://github.com/vllm-project/flash-attention.git GIT_TAG dc9d410b3e2d6534a4c70724c2515f4def670a22
GIT_PROGRESS TRUE
# Don't share the vllm-flash-attn build between build types
BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn
)
```
First, flash mla is " Only build FlashMLA kernels if we are building for something compatible with" so we can skip the first.
I successfully built the original version of flash-attn (
https://github.com/Dao-AILab/flash-attention) , however, the vllm-flash-attn is a forked version and includes some files for vllm. (such as vllm_flash_attn directory).
In this case, how can I treat these files. Do I need to package flash-attn and vllm-flash-attn separately?
------------------------------------------------------------------------------------------------------
Kohei Sendai
-------------------------------------------------------------------------------------------