Can this model use with VLLM?
#2
by
ChloeHuang1
- opened
Can this model use with VLLM?
Yes. I am getting 60+ tokens/s (single user) on 3090.
@twhitworth What context length are you running?
I'm trying vllm for the first time (used to ollama) so I'd appreciate any pointers on your onboarding routine when trying new models.
I tried running it on windows via wsl2, with a 3090 i get ~35tokens/s, using a context length of 5700