any idea how to test this for inferencing using vllm?

#1
by silvacarl - opened

we tried every method we can think of, just keep getting error messaages saying AWS is not an option

Which vllm version are you using? This model is a safetensor model, vllm fixes awq safetensor support in this PR, which is not released yet.

Yes, my recent AWQ readmes contain this extra info:

Note: at the time of writing, vLLM has not yet done a new release with support for the `quantization` parameter.

If you try the code below and get an error about `quantization` being unrecognised, please install vLLM from Github source.

got it, will do. thx!!!!!!!!!!!!

silvacarl changed discussion status to closed

Sign up or log in to comment