This is a quantised version in safetensor format of the oasst-llama-13b-2-epochs model from dvruette/oasst-llama-13b-2-epochs It has a siginficant speed up for inference when used on oobabooga. Run with.. python server.py --model oasst-llama-13b-2-epochs-GPTQ-4bit-128g --wbits 4 --groupsize 128