Quant of https://huggingface.co/TheBloke/vicuna-13B-1.1-HF

There's already one located at https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g, but neither version they uploaded works with certain older versions of GPTQ-for-LLaMA (such as 0cc4m's fork that is used with their fork of KoboldAI).

This was quantized with 0cc4m's fork of GPTQ-for-LLaMA.

python llama.py ./vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors

Downloads last month: 2

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.