This is a GPTQ 4bits version of Auto-J-13B. We convert it using this script (by TheBroke).

To use the 4bits version of Auto-J, you need to install the following packages:

pip install safetensors
pip install transformers>=4.32.0 optimum>=1.12.0
pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # Use cu117 if on CUDA 11.7

It takes about 8GB VRAM to load this model, and we provide an example for using it in example_gptq4bits.py.

Note that the behaviours of the quantized model and the original one might be different.

Please refer to our github repo for more datails.

Downloads last month
4
Safetensors
Model size
2.03B params
Tensor type
F32
I32
FP16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support