TGI: gptq quantization is not supported for AutoModel
#1
by
4639-94d6
- opened
I'm trying to get this model to run with huggingface text-generation-inference docker container using a single NVIDIA A10G. However, it crashes and returns the following error: NotImplementedError: gptq quantization is not supported for AutoModel. Any clue what causes this error?
docker run -d --restart unless-stopped --gpus all --network host --shm-size 1g -e HUGGING_FACE_HUB_TOKEN=xxx -v $volume:/data --name sw3 ghcr.io/huggingface/text-generation-inference:latest --model-id AI-Sweden-Models/gpt-sw3-20b-instruct-4bit-gptq --max-input-length 4096 --max-total-tokens 8192 --quantize gptq