Can VLLM be used for inference acceleration?
"architectures": [ "MixtralForCausalLM" ],you need to check whether vllm support "MixtralForCausalLM"
yeah VLLM supports that
· Sign up or log in to comment