Chinese-Alpaca-2-1.3B-RLHF-GGUF

This repository contains GGUF-v3 version (llama.cpp compatible) of Chinese-Alpaca-2-1.3B-RLHF, which is tuned on Chinese-Alpaca-2-1.3B with RLHF using DeepSpeed-Chat.

The optimal context length is 1K for this model. Specify -c 1024 when using with llama.cpp

Performance

Metric: PPL, lower is better

Quant	original	imatrix (`-im`)
Q2_K	20.1066 +/- 0.29236	18.8209 +/- 0.27561
Q3_K	16.9214 +/- 0.26133	16.5729 +/- 0.25706
Q4_0	15.8056 +/- 0.23749	-
Q4_K	16.1579 +/- 0.25064	15.7746 +/- 0.24476
Q5_0	15.4528 +/- 0.23911	-
Q5_K	15.3198 +/- 0.23627	15.4791 +/- 0.23959
Q6_K	15.3718 +/- 0.23764	15.2572 +/- 0.23549
Q8_0	15.3302 +/- 0.23727	-
F16	15.3291 +/- 0.23728	-

The model with -im suffix is generated with important matrix, which has generally better performance (not always though).

Others

For full model in HuggingFace format, please see: https://huggingface.co/hfl/chinese-alpaca-2-1.3b-rlhf

Please refer to https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/ for more details.