mgoin commited on
Commit
7394f60
1 Parent(s): 71a4b63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -10
README.md CHANGED
@@ -50,13 +50,16 @@ model.save_quantized(quantized_model_dir)
50
  ## Evaluation
51
 
52
  ### Open LLM Leaderboard evaluation scores
53
- | | Meta-Llama-3-70B-Instruct | Meta-Llama-3-70B-Instruct-FP8 | Meta-Llama-3-70B-Instruct-FP8-KV<br>(this model) |
54
- | :----------------------: | :-----------------------: | :---------------------------: | :----------------------------------------------: |
55
- | arc-c<br>25-shot | 72.69 | 72.61 | 72.57 |
56
- | hellaswag<br>10-shot | 85.50 | 85.41 | 85.32 |
57
- | mmlu<br>5-shot | 80.18 | 80.06 | 79.69 |
58
- | truthfulqa<br>0-shot | 62.90 | 62.73 | 61.92 |
59
- | winogrande<br>5-shot | 83.34 | 83.03 | 83.66 |
60
- | gsm8k<br>5-shot | 92.49 | 91.12 | 90.83 |
61
- | **Average<br>Accuracy** | **79.51** | **79.16** | **79.00** |
62
- | **Recovery** | **100%** | **99.55%** | **99.36%** |
 
 
 
 
50
  ## Evaluation
51
 
52
  ### Open LLM Leaderboard evaluation scores
53
+
54
+ Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
55
+
56
+ | Benchmark | Meta-Llama-3-70B-Instruct | Meta-Llama-3-70B-Instruct-FP8 | Meta-Llama-3-70B-Instruct-FP8-KV<br>(this model) |
57
+ | :-------------------------------------------------------: | :-----------------------: | :---------------------------: | :----------------------------------------------: |
58
+ | [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot | 72.69 | 72.61 | 72.57 |
59
+ | [HellaSwag](https://arxiv.org/abs/1905.07830)<br> 10-shot | 85.50 | 85.41 | 85.32 |
60
+ | [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot | 80.18 | 80.06 | 79.69 |
61
+ | [TruthfulQA](https://arxiv.org/abs/2109.07958)<br> 0-shot | 62.90 | 62.73 | 61.92 |
62
+ | [WinoGrande](https://arxiv.org/abs/1907.10641)<br> 5-shot | 83.34 | 83.03 | 83.66 |
63
+ | [GSM8K](https://arxiv.org/abs/2110.14168)<br> 5-shot | 92.49 | 91.12 | 90.83 |
64
+ | **Average<br>Accuracy** | **79.51** | **79.16** | **79.00** |
65
+ | **Recovery** | **100%** | **99.55%** | **99.36%** |