Update README.md
Browse files
README.md
CHANGED
@@ -50,13 +50,16 @@ model.save_quantized(quantized_model_dir)
|
|
50 |
## Evaluation
|
51 |
|
52 |
### Open LLM Leaderboard evaluation scores
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
|
57 |
-
|
|
58 |
-
|
|
59 |
-
|
|
60 |
-
|
|
61 |
-
|
|
62 |
-
|
|
|
|
|
|
|
|
|
50 |
## Evaluation
|
51 |
|
52 |
### Open LLM Leaderboard evaluation scores
|
53 |
+
|
54 |
+
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
55 |
+
|
56 |
+
| Benchmark | Meta-Llama-3-70B-Instruct | Meta-Llama-3-70B-Instruct-FP8 | Meta-Llama-3-70B-Instruct-FP8-KV<br>(this model) |
|
57 |
+
| :-------------------------------------------------------: | :-----------------------: | :---------------------------: | :----------------------------------------------: |
|
58 |
+
| [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 72.69 | 72.61 | 72.57 |
|
59 |
+
| [HellaSwag](https://arxiv.org/abs/1905.07830)<br> 10-shot | 85.50 | 85.41 | 85.32 |
|
60 |
+
| [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 80.18 | 80.06 | 79.69 |
|
61 |
+
| [TruthfulQA](https://arxiv.org/abs/2109.07958)<br> 0-shot | 62.90 | 62.73 | 61.92 |
|
62 |
+
| [WinoGrande](https://arxiv.org/abs/1907.10641)<br> 5-shot | 83.34 | 83.03 | 83.66 |
|
63 |
+
| [GSM8K](https://arxiv.org/abs/2110.14168)<br> 5-shot | 92.49 | 91.12 | 90.83 |
|
64 |
+
| **Average<br>Accuracy** | **79.51** | **79.16** | **79.00** |
|
65 |
+
| **Recovery** | **100%** | **99.55%** | **99.36%** |
|