TheBloke/Llama-2-70B-GPTQ · Wrong shape when loading with Peft-AutoGPTQ

tridungduong16

Jul 25, 2023

•

edited Jul 27, 2023

I can fine tune well with 13B model, but fails for 70B, errors with reshape. Does any one try to finetune it?

tridungduong16

Jul 27, 2023

•

edited Jul 27, 2023

 File "/envs/dtd_env/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_triton.py", line 141, in forward
    out = out.half().reshape(out_shape)
RuntimeError: shape '[1, 665, 24576]' is invalid for input of size 6809600

shawei3000

Oct 1, 2023

•

edited Oct 1, 2023

I got similar error message, fine tune 70B GPTQ model, @tridungduong16 , have u figured out a solution?

File ~/anaconda3/envs/py310_torch20/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:195 in forward
key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
shape '[1, 768, 64, 128]' is invalid for input of size 786432

its my recent understanding adding 2 lora layers on base llama is bad idea, its better to mix both lora training data, and that requires fine-tuning on llama 2 directly... testing that...