Edit model card

Chinese-LLaMA-2-13B-GGUF

This repository contains the GGUF-v3 models (llama.cpp compatible) for Chinese-LLaMA-2-13B.

Performance

Metric: PPL, lower is better

Quant original imatrix (-im)
Q2_K 14.4701 +/- 0.26107 17.4275 +/- 0.31909
Q3_K 10.1620 +/- 0.18277 9.7486 +/- 0.17744
Q4_0 9.8633 +/- 0.17792 -
Q4_K 9.2735 +/- 0.16793 9.2734 +/- 0.16792
Q5_0 9.3553 +/- 0.16945 -
Q5_K 9.1767 +/- 0.16634 9.1594 +/- 0.16590
Q6_K 9.1326 +/- 0.16546 9.1478 +/- 0.16583
Q8_0 9.1394 +/- 0.16574 -
F16 9.1050 +/- 0.16518 -

The model with -im suffix is generated with important matrix, which has generally better performance (not always though).

Others

For Hugging Face version, please see: https://huggingface.co/hfl/chinese-llama-2-13b

Please refer to https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/ for more details.

Downloads last month
197
GGUF
Model size
13.3B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.