is it possible to use a continuous batching inference server with this model?

#14

by natserrano - opened Sep 29, 2023

Sep 29, 2023

vLLM doesn't work

any other recommendations to achieve 10 calls per sec?

any other AWQ model similar/comparable to this bad boy?

thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment