Can't use with tgi. Getting `RuntimeError: weight transformer.h.0.self_attention.query_key_value.weight does not exist`

#12
by mpronesti - opened

Hi there!
I'm trying to use this model with text-generation-inference. Here's the script

volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 2g -p 8080:80 -v $volume:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id TheBloke/falcon-7b-instruct-gptq \
--sharded false \
--quantize "gptq" \
--max-total-tokens 2048 \
--trust-remote-code 

However, I get this error

RuntimeError: weight transformer.h.0.self_attention.query_key_value.weight does not exist

Unfortunately Text Generation Inference have included a version of GPTQ that doesn't support most of the GPTQs currently on Hugging Face.

I hope to be able to release new GPTQs in future that will be compatible, but for now you'll need to see if there's another GPTQ that works with TGI, or make your own.

Sign up or log in to comment