hfl
/

chinese-llama-2-13b-gguf

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Chinese-LLaMA-2-13B-GGUF

This repository contains the GGUF-v3 models (llama.cpp compatible) for Chinese-LLaMA-2-13B.

Performance

Metric: PPL, lower is better

Quant	original	imatrix (`-im`)
Q2_K	14.4701 +/- 0.26107	17.4275 +/- 0.31909
Q3_K	10.1620 +/- 0.18277	9.7486 +/- 0.17744
Q4_0	9.8633 +/- 0.17792	-
Q4_K	9.2735 +/- 0.16793	9.2734 +/- 0.16792
Q5_0	9.3553 +/- 0.16945	-
Q5_K	9.1767 +/- 0.16634	9.1594 +/- 0.16590
Q6_K	9.1326 +/- 0.16546	9.1478 +/- 0.16583
Q8_0	9.1394 +/- 0.16574	-
F16	9.1050 +/- 0.16518	-

The model with -im suffix is generated with important matrix, which has generally better performance (not always though).

Others

For Hugging Face version, please see: https://huggingface.co/hfl/chinese-llama-2-13b

Please refer to https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/ for more details.

Downloads last month: 271

GGUF

Model size

13.3B params

Architecture

llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.