How to do inferance?

#2
by NePe - opened

What should i use to do inference with this model? I tried GPTQ-for-LLaMa but it showed checkpoint shape mismatch errors.

Please show an error log, what specs you are running, any extensions like the text generation UI... etc. More info the better.

I just installed/tried the GPTQ-for-LLaMa code from github with the llama_inference.py. I tried many lora/non lora 4bit models from HF and it seems like only the ozcur/alpaca-native-4bit working for me. The other ones give RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: Missing key(s) in state_dict: ... Unexpected key(s) in state_dict: ... size mismatch for model.layers ... errors.

Must be a rename in one of the libs, i compared the missing/unexpected keys ex:
missing: model.layers.0.self_attn.q_proj.qzeros
unexpected: model.layers.0.self_attn.q_proj.zeros

Yep, seems like they renamed some stuff to broke the old stuff as usually happens:
https://github.com/qwopqwop200/GPTQ-for-LLaMa/commit/a270974e732884126ddb36f64d0a0a25261bb94f

If anyone has the same problem just downgrade to the older version of the lib:
git checkout 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install

Seems to be working fine now :)

Now we cannot use this fix since webui also updated. I got parameter mismatch if i checkout an older commit

New model is uploaded, going to close this issue.

elinas changed discussion status to closed

New model is uploaded, going to close this issue.

Any idea why text-generation-webui cannot find config.json file? I have everything needed in the folder.
https://github.com/oobabooga/text-generation-webui/issues/613
Hope someone can figure it out

Sign up or log in to comment