GPTQModel don't support VLLM backend for this model.

by Anditty - opened Jul 22, 2024

Jul 22, 2024

model = GPTQModel.from_quantized(model_name, trust_remote_code=True, device_map="sequential", max_memory=max_memory, torch_dtype=torch.float16, backend=BACKEND.VLLM)

[rank0]: assert self.quant_method is not None
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment