Error(s) in loading state_dict for LlamaForCausalLM
#1
by
noprompt
- opened
After running
python -m fastchat.serve.cli --model-path LLaMa/airoboros-33B-gpt4-1.2-GPTQ --gptq-ckpt LaMa/airoboros-33B-gpt4-1.2-GPTQ/airoboros-33b-gpt4-1.2-GPTQ-4bit--1g.act.order.safetensors --gptq-wbits 4 --gptq-groupsize 128
I got this nasty error
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is
...
size mismatch for model.layers.59.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.59.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]).
size mismatch for model.layers.59.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
I believe this is because FastChat must be assuming the wrong group_size for the model.
This model has group_size = -1
, meaning no group_size.
I have never tested with FastChat or its GPTQ-for-LLaMa implementation so I can't yet provide support for that. But see if there's a way to specify the group_size, and then specify it as -1
.
Hey, thanks for your help! I really appreciate it. It looks like dropping the gptq-groupsize
flag works too.