TGI: gptq quantization is not supported for AutoModel

#1
by 4639-94d6 - opened

I'm trying to get this model to run with huggingface text-generation-inference docker container using a single NVIDIA A10G. However, it crashes and returns the following error: NotImplementedError: gptq quantization is not supported for AutoModel. Any clue what causes this error?

docker run -d --restart unless-stopped --gpus all --network host --shm-size 1g -e HUGGING_FACE_HUB_TOKEN=xxx -v $volume:/data --name sw3 ghcr.io/huggingface/text-generation-inference:latest --model-id AI-Sweden-Models/gpt-sw3-20b-instruct-4bit-gptq --max-input-length 4096 --max-total-tokens 8192 --quantize gptq

Sign up or log in to comment