Can this model use with VLLM?

#2
by ChloeHuang1 - opened

Can this model use with VLLM?

Yes. I am getting 60+ tokens/s (single user) on 3090.

@twhitworth What context length are you running?

I'm trying vllm for the first time (used to ollama) so I'd appreciate any pointers on your onboarding routine when trying new models.

I tried running it on windows via wsl2, with a 3090 i get ~35tokens/s, using a context length of 5700

Sign up or log in to comment