hfl
/

chinese-mixtral-instruct-gguf

Mixture of Experts

Inference Endpoints

Model card Files Files and versions

hfl-rc commited on Mar 27, 2024

Commit

1093147

·

verified ·

1 Parent(s): 0fab619

Update README.md

Files changed (1) hide show

README.md +23 -15

README.md CHANGED Viewed

@@ -22,21 +22,29 @@ This repository contains the GGUF-v3 models (llama.cpp compatible) for **Chinese
 Metric: PPL, lower is better
-| Quant | PPL  |
-| ----- | ---- |
-| IQ1_S | 27.7911 +/- 0.27400 |
-| IQ2_XXS | 6.7233 +/- 0.06118 |
-| IQ2_XS | 7.4175 +/- 0.08420 |
-| Q2_K  | 4.5758 +/- 0.03959    |
-| IQ3_XXS | 4.0389 +/- 0.03489 |
-| Q3_K  | 4.5563 +/- 0.04126     |
-| Q4_0  | 3.9757 +/- 0.03455      |
-| Q4_K  | 3.9265 +/- 0.03407    |
-| Q5_0  | 3.9167 +/- 0.03399     |
-| Q5_K  | 3.9232 +/- 0.03403    |
-| Q6_K  | 3.9242 +/- 0.03415     |
-| Q8_0  | 3.9159 +/- 0.03402     |
-| F16   |   x   |
 Due to the file size limitation, for F16 model, please use `cat` command to concatenate all parts into a single file. **You must concatenate these parts in order.**

 Metric: PPL, lower is better
+| Quant   | Size ↓  | PPL                |
+| ------- | ------- | ------------------ |
+| IQ1_S   | 9.8 GB  | 9.5782 +/- 0.08909 |
+| IQ1_M   | 10.8 GB | 7.4666 +/- 0.06741 |
+| IQ2_XXS | 12.3 GB | 6.3923 +/- 0.05674 |
+| IQ2_XS  | 13.7 GB | 6.0606 +/- 0.05834 |
+| IQ2_S   | 14.1 GB | 4.7617 +/- 0.04177 |
+| IQ2_M   | 15.5 GB | 4.5911 +/- 0.04054 |
+| Q2_K    | 17.3 GB | 4.8592 +/- 0.04303 |
+| IQ3_XXS | 18.3 GB | 4.3557 +/- 0.03846 |
+| IQ3_XS  | 19.3 GB | 4.3328 +/- 0.03779 |
+| IQ3_S   | 20.4 GB | 4.3138 +/- 0.03785 |
+| IQ3_M   | 21.4 GB | 4.3024 +/- 0.03775 |
+| Q3_K    | 22.5 GB | 4.4334 +/- 0.03937 |
+| IQ4_XS  | 25.1 GB | 4.2324 +/- 0.03757 |
+| Q4_0    | 26.4 GB | 4.2688 +/- 0.03787 |
+| IQ4_NL  | 26.5 GB | 4.2384 +/- 0.03763 |
+| Q4_K    | 28.4 GB | 4.2433 +/- 0.03768 |
+| Q5_0    | 32.2 GB | 4.2142 +/- 0.03733 |
+| Q5_K    | 33.2 GB | 4.2177 +/- 0.03743 |
+| Q6_K    | 38.4 GB | 4.2184 +/- 0.03754 |
+| Q8_0    | 49.6 GB | 4.2053 +/- 0.03732 |
+| F16     | 93.5 GB | x                  |
 Due to the file size limitation, for F16 model, please use `cat` command to concatenate all parts into a single file. **You must concatenate these parts in order.**