Can this model use with VLLM?

by ChloeHuang1 - opened Feb 1

Feb 1

Can this model use with VLLM?

Feb 2

Yes. I am getting 60+ tokens/s (single user) on 3090.

Ziizu

Feb 9

@twhitworth What context length are you running?

I'm trying vllm for the first time (used to ollama) so I'd appreciate any pointers on your onboarding routine when trying new models.

Feb 9

•

I tried running it on windows via wsl2, with a 3090 i get ~35tokens/s, using a context length of 5700

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment