RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

#1
by LaferriereJC - opened

trying to run on cpu

model = AutoGPTQForCausalLM.from_quantized(
model_repo,
device=device,
use_safetensors=True,
use_triton=device != "cpu", # comment/remove if not on Linux
).to(device).to(torch.float32)
"""
model = AutoGPTQForCausalLM.from_quantized(
model_repo,
device=device,
use_safetensors=True,
use_triton=device != "cpu", # comment/remove if not on Linux
).to(device)
"""

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Analytics Club at ETH Zürich org

Hi! thanks for raising this and I'm totally on board - auto-GPTQ does not seem to work on CPU at the moment. I have an issue open for this problem on the repo here, it would be awesome if you could also post this there so it gets more attention :)

Sign up or log in to comment