[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Want some advice !



HI, I'm tackling gpu version vllm and find some difficult part.

I read and tried to understand https://github.com/vllm-project/vllm vllm build system and it uses two external projects.( flash mla and vllm-flash-attn).
These two dependencies are cloned from the internet during the build in this part 

.cmake/external_projects/vllm_flash_attn.cmake
 
```FetchContent_Declare(
          vllm-flash-attn
          GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
          GIT_TAG dc9d410b3e2d6534a4c70724c2515f4def670a22
          GIT_PROGRESS TRUE
          # Don't share the vllm-flash-attn build between build types
          BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn
  )
```

First, flash mla is " Only build FlashMLA kernels if we are building for something compatible with" so we can skip the first.

Second, vllm-flash-attn, it is a bit weird. https://github.com/vllm-project/flash-attention.git
I successfully built the original version of flash-attn (https://github.com/Dao-AILab/flash-attention) , however, the vllm-flash-attn is a forked version and includes some files for vllm. (such as vllm_flash_attn directory).
In this case, how can I treat these files.  Do I need to package flash-attn and vllm-flash-attn separately? 


------------------------------------------------------------------------------------------------------

Kohei Sendai

-------------------------------------------------------------------------------------------


Reply to: