Sorry for a bit delay.
Now, the big progress is I could build vllm with cpu version and it works!
I'm really happy about this.
Of course, my next goal is gpu version.
However, I struggle with ray and xformers.
First, ray is build with bazel. However, it has strict version dependencies with bazel it self, and it download source code from the Internet. Copy the code by hand might not be a big issue, but the Bazel version is critical for me. I'm struggling with this problem almost all week....
Thanks for your help, I could upload some packages and will upload more.
Also, I will send the MR in llama.cpp for packaging gguf-py.
I know you are busy, but I hope I can get some advice on handling ray and xformers.
Regards.
-----------------------------------------------------------------------------------------------------
Kohei Sendai
-------------------------------------------------------------------------------------------