torch.bfloat16 is not supported for quantization method awq

#2
by Pizzarino - opened

Hey, I tried the vLLM example in the model card (just copied and pasted it) and I'm running into this error:

ValueError: torch.bfloat16 is not supported for quantization method awq. Supported dtypes: [torch.float16]

Is there a fix to be able to use the AWQ model with vLLM instead of AutoAWQ?

What version of vLLM are you using? I had thought that the latest supported bfloat16 with AWQ. 2.0, the first with AWQ support, definitely did not. But I thought it came later.

Either way, you should specify dtype="auto" in either Python code or as a command line parameter. That will load it in bfloat16 if it can, otherwise float16.

This README hasn't been updated in a while - my newer README template include the dtype="auto" parameter in the examples.

All my AWQ READMEs are going to be updated later today anyway when I update for Transformers AWQ support, so that will get changed then.

I'm using version 0.2.1.post1; I did a reinstall of it too just in case something got messed up during installation and the issue with bfloat16 still persisted.

I'll definitely specify the dtype in my Python code! :)

Thank you so much for your help, you're a legend. <3

Hi, you can apply the following workaround, edit config.json and change
"torch_dtype": "bfloat16" --> "torch_dtype": "float16",

Yeah but it's easier just to pass --dtype auto or dtype="auto"

For me specifying auto didn't work i still got the same error. But specifiying dtype="float16" did work.

Sign up or log in to comment