Can't run the model

#1
by MohamedRashad - opened

I use this command for inference

CUDA_VISIBLE_DEVICES=0 python llama_inference.py elinas/vicuna-13b-4bit --wbits 4 --groupsize 128 --load vicuna-13b-4bit/vicuna-13b-4bit-128g.safetensors --text "this is llama"

and it give me an error in loading state dict

See this in the README and ensure you're on the stable commit. https://huggingface.co/elinas/vicuna-13b-4bit#update-2023-04-03

With the stable commit indicated in the readme, I'm able to load the model fine but inference still fails with the following error:

TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
    1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: at::Tensor) -> None

Any ideas? Is there a particular commit of transformers you are pinned to?

This does not support llama.cpp, this is for GPTQ via CUDA (or triton).

Right, I understand -- this is running on an A5000 in the cloud. Perhaps it's not using the correct device, I will investigate a bit further.

EDIT -- was able to get it working, needed to rerun setup_cuda

fatal: reference is not a tree: a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773

I am getting reference error when checkingout to the commit specified

I have not seen that error format before so I assumed you were using the Python llama.cpp wrapper.

I am getting reference error when checkingout to the commit specified

Paste the content of the topmost entry from the command git log

fatal: reference is not a tree: a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773

I am getting reference error when checkingout to the commit specified

I think you just need to do git fetch origin a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773

I was able to fix the quant problem now it gives me this error:

Missing key(s) in state_dict

@nealchandra Thanks, I saw that commit is no longer in the base repo (I haven't fetched so I can still checkout that branch). I have updated the instructions to just use the fork.

I was able to fix the quant problem now it gives me this error:

Missing key(s) in state_dict

Make sure you have followed the above steps such as running python setup_cuda.py install and having all of the requirements.txt installed.

https://github.com/oobabooga/GPTQ-for-LLaMa#installation

Sign up or log in to comment