On 7/13/25 10:15 AM, 千代航平 wrote:
Of course, my next goal is gpu version. However, I struggle with ray and xformers.First, ray is build with bazel. However, it has strict version dependencies with bazel it self, and it download source code from the Internet. Copy the code by hand might not be a big issue, but the Bazel version is critical for me. I'm struggling with this problem almost all week....
ray is not mandatory: https://github.com/vllm-project/vllm/blob/3fc964433a84bad785d9d0656fd56195462321b8/vllm/config.py#L2098-L2142
According to the code, ray is needed for multi-node distributed inference. If we disable ray, we are still able to do single-node inference with multiple GPUs. Let's figure out a way to disable ray while building vllm-cuda, so that we can avoid bazel.