torch.bfloat16 is not supported for quantization method awq

by Pizzarino - opened Nov 2, 2023

Discussion

Pizzarino

Nov 2, 2023

Hey, I tried the vLLM example in the model card (just copied and pasted it) and I'm running into this error:

ValueError: torch.bfloat16 is not supported for quantization method awq. Supported dtypes: [torch.float16]

Is there a fix to be able to use the AWQ model with vLLM instead of AutoAWQ?

TheBloke

Owner Nov 2, 2023

What version of vLLM are you using? I had thought that the latest supported bfloat16 with AWQ. 2.0, the first with AWQ support, definitely did not. But I thought it came later.

Either way, you should specify dtype="auto" in either Python code or as a command line parameter. That will load it in bfloat16 if it can, otherwise float16.

This README hasn't been updated in a while - my newer README template include the dtype="auto" parameter in the examples.

All my AWQ READMEs are going to be updated later today anyway when I update for Transformers AWQ support, so that will get changed then.

Pizzarino

Nov 3, 2023

I'm using version 0.2.1.post1; I did a reinstall of it too just in case something got messed up during installation and the issue with bfloat16 still persisted.

I'll definitely specify the dtype in my Python code! :)

Thank you so much for your help, you're a legend. <3

ikaro79

Nov 10, 2023

Hi, you can apply the following workaround, edit config.json and change
"torch_dtype": "bfloat16" --> "torch_dtype": "float16",

TheBloke

Owner Nov 10, 2023

Yeah but it's easier just to pass --dtype auto or dtype="auto"

romant319

Nov 28, 2023

For me specifying auto didn't work i still got the same error. But specifiying dtype="float16" did work.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment