Edit model card

Chinese-LLaMA-2-7B-GGUF

This repository contains the GGUF-v3 models (llama.cpp compatible) for Chinese-LLaMA-2-7B.

Performance

Metric: PPL, lower is better

Quant original imatrix (-im)
Q2_K 15.1160 +/- 0.30469 12.7682 +/- 0.26022
Q3_K 9.9588 +/- 0.20549 9.8508 +/- 0.20484
Q4_0 9.8085 +/- 0.20350 -
Q4_K 9.5802 +/- 0.20015 9.6327 +/- 0.20219
Q5_0 9.4783 +/- 0.19622 -
Q5_K 9.5132 +/- 0.19989 9.4447 +/- 0.19772
Q6_K 9.4640 +/- 0.19909 9.4507 +/- 0.19849
Q8_0 9.4659 +/- 0.19927 -
F16 9.4627 +/- 0.19921 -

The model with -im suffix is generated with important matrix, which has generally better performance (not always though).

Others

For Hugging Face version, please see: https://huggingface.co/hfl/chinese-llama-2-7b

Please refer to https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/ for more details.

Downloads last month
1,257
GGUF
Model size
6.93B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.