Inference error, tensor shapes.

#18
by alejandrofdz - opened

HI everyone, thanks @TheBloke for your great job.

Im trying to inference with TheBloke/Llama-2-70B-chat-GPTQ and I get the next error:

out = out + self.bias if self.bias is not None else out
RuntimeError: The size of tensor a (24576) must match the size of tensor b (10240) at non-singleton dimension 2

At first I thought it was an installation problema but my code works with TheBloke/Llama-2-13B-chat-GPTQ... It occurs also with FreeWilly2, maybe beacuse its based on Llama-2-70B.

Any help will be appreciated.

Can you check the sha256sum of the .safetensors file, or just try downloading the model again. The download may have terminated early, giving you an invalid file

Also please confirm you're using Transformers 4.31.0 which is required for 70B.

Thank you so much @TheBloke ! It was transformers version I thought I had the newest!

Regards!

I face the similar problem when fine-tuning with AutoGPTQ. Does you solve the problems?

@tridungduong16 make sure you're using AutoGPTQ 0.3.2 + Transformers 4.31.0

I've already used the latest version of Transformers indeed.

Screenshot 2023-07-27 at 10.15.00 pm.png

Same for 4.31.0

Screenshot 2023-07-27 at 10.17.06 pm.png

@tridungduong16 I'm confused, you said you had a problem with AutoGPTQ but your error screenshot show ExLlama, not AutoGPTQ?

If you're using ExLlama then please make sure ExLlama is updated to the latest version. This model definitely works with ExLlama, so you might have an older version that doesn't support 70B.

Sorry, I have the wrong screenshot. I use the fine-tune scripts from https://github.com/PanQiWei/AutoGPTQ/blob/main/examples/peft/peft_lora_clm_instruction_tuning.py.

It works well for 13B model such as:

Library version I use is:

  • transformers.version '4.32.0.dev0'
  • auto_gptq.version '0.3.2'

Screenshot 2023-07-28 at 8.22.12 am.png

Sign up or log in to comment