# FACET ## How to run ```bash python main.py --full-name -s --enable-converter -c --enable-quantizer -q -t q4_0 ``` For example, ```bash python main.py --full-name baichuan-inc/Baichuan2-13B-Chat -s /home/ubuntu/workspace/models/ --enable-converter -c /home/ubuntu/workspace/llama.cpp/convert.py --enable-quantizer -q llama-cpp-quantizer -t Q5_K_M ``` ## Allowed quantization types ```text 2 or Q4_0 : 3.56G, +0.2166 ppl @ LLaMA-v1-7B 3 or Q4_1 : 3.90G, +0.1585 ppl @ LLaMA-v1-7B 8 or Q5_0 : 4.33G, +0.0683 ppl @ LLaMA-v1-7B 9 or Q5_1 : 4.70G, +0.0349 ppl @ LLaMA-v1-7B 10 or Q2_K : 2.63G, +0.6717 ppl @ LLaMA-v1-7B 12 or Q3_K : alias for Q3_K_M 11 or Q3_K_S : 2.75G, +0.5551 ppl @ LLaMA-v1-7B 12 or Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B 13 or Q3_K_L : 3.35G, +0.1764 ppl @ LLaMA-v1-7B 15 or Q4_K : alias for Q4_K_M 14 or Q4_K_S : 3.59G, +0.0992 ppl @ LLaMA-v1-7B 15 or Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B 17 or Q5_K : alias for Q5_K_M 16 or Q5_K_S : 4.33G, +0.0400 ppl @ LLaMA-v1-7B 17 or Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B 18 or Q6_K : 5.15G, -0.0008 ppl @ LLaMA-v1-7B 7 or Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B 1 or F16 : 13.00G @ 7B 0 or F32 : 26.00G @ 7B COPY : only copy tensors, no quantizing ```