any idea how to test this for inferencing using vllm?
#1
by
silvacarl
- opened
we tried every method we can think of, just keep getting error messaages saying AWS is not an option
Yes, my recent AWQ readmes contain this extra info:
Note: at the time of writing, vLLM has not yet done a new release with support for the `quantization` parameter.
If you try the code below and get an error about `quantization` being unrecognised, please install vLLM from Github source.
got it, will do. thx!!!!!!!!!!!!
silvacarl
changed discussion status to
closed