This is a GPTQ 4bits version of Auto-J-13B. We convert it using this script (by TheBroke).
To use the 4bits version of Auto-J, you need to install the following packages:
pip install safetensors
pip install transformers>=4.32.0 optimum>=1.12.0
pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7
It takes about 8GB VRAM to load this model, and we provide an example for using it in example_gptq4bits.py.
Note that the behaviours of the quantized model and the original one might be different.
Please refer to our github repo for more datails.
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support