Quantized Meta AI's LLaMA in 4bit with the help of GPTQ algorithm v2.

Conversion process:

CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-13b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors ./q4/llama13b-4bit-ts-ao-g128-v2.safetensors

Note: This model will fail to load with current GPTQ-for-LLaMa implementation

Conversion process

CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-13b c4 --wbits 4 --true-sequential --act-order  --save_safetensors ./q4/llama13b-4bit-v2.safetensors
Downloads last month
7
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.