is it possible to use a continuous batching inference server with this model?
#14
by
natserrano
- opened
vLLM doesn't work
any other recommendations to achieve 10 calls per sec?
any other AWQ model similar/comparable to this bad boy?
thanks!