vllm

#10
by regzhang - opened

Can the VLLM inference framework support running inference with this model? How can it be adjusted or modified to run on a setup with 8 Nvidia RTX 3090 GPUs?

Owner

it is mixtral architecture supported by vllm, but I have no idea how to setup with 8 Nvidia RTX 3090 GPUs.

i think you are looking for this https://github.com/vllm-project/vllm/pull/2293

Sign up or log in to comment