TheBloke/Falcon-7B-Instruct-GPTQ · Can't use with tgi. Getting `RuntimeError: weight transformer.h.0.self_attention.query_key

Hi there!
I'm trying to use this model with text-generation-inference. Here's the script

volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 2g -p 8080:80 -v $volume:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id TheBloke/falcon-7b-instruct-gptq \
--sharded false \
--quantize "gptq" \
--max-total-tokens 2048 \
--trust-remote-code

However, I get this error

RuntimeError: weight transformer.h.0.self_attention.query_key_value.weight does not exist

TheBloke
/

Falcon-7B-Instruct-GPTQ

Can't use with tgi. Getting `RuntimeError: weight transformer.h.0.self_attention.query_key_value.weight does not exist`