hfl
/

chinese-alpaca-2-1.3b-rlhf-gguf

Inference Endpoints

Model card Files Files and versions Community

chinese-alpaca-2-1.3b-rlhf-gguf / README.md

hfl-rc's picture

Update README.md

b880ba8 verified 12 months ago

|

history blame contribute delete

1.25 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	---

	# Chinese-Alpaca-2-1.3B-RLHF-GGUF

	This repository contains GGUF-v3 version (llama.cpp compatible) of Chinese-Alpaca-2-1.3B-RLHF, which is tuned on Chinese-Alpaca-2-1.3B with RLHF using DeepSpeed-Chat.

	The optimal context length is 1K for this model. Specify `-c 1024` when using with llama.cpp


	## Performance

	Metric: PPL, lower is better

	\| Quant \| original \| imatrix (`-im`) \|
	\|-----\|------\|------\|
	\| Q2_K \| 20.1066 +/- 0.29236 \| 18.8209 +/- 0.27561 \|
	\| Q3_K \| 16.9214 +/- 0.26133 \| 16.5729 +/- 0.25706 \|
	\| Q4_0 \| 15.8056 +/- 0.23749 \| - \|
	\| Q4_K \| 16.1579 +/- 0.25064 \| 15.7746 +/- 0.24476 \|
	\| Q5_0 \| 15.4528 +/- 0.23911 \| - \|
	\| Q5_K \| 15.3198 +/- 0.23627 \| 15.4791 +/- 0.23959 \|
	\| Q6_K \| 15.3718 +/- 0.23764 \| 15.2572 +/- 0.23549 \|
	\| Q8_0 \| 15.3302 +/- 0.23727 \| - \|
	\| F16 \| 15.3291 +/- 0.23728 \| - \|

	The model with `-im` suffix is generated with important matrix, which has generally better performance (not always though).


	## Others

	For full model in HuggingFace format, please see: https://huggingface.co/hfl/chinese-alpaca-2-1.3b-rlhf

	Please refer to [https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/) for more details.