Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Inference Endpoints
siddartha-abacus commited on
Commit
d11fb56
1 Parent(s): a507cd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -131,4 +131,20 @@ Meta-Llama-3-70B-Instruct 9.006250
131
  | GPT-4-Turbo | 9.38 | 9.00 | 9.19 |
132
  | Meta-Llama-3-70B-Instruct | 9.21 | 8.80 | 9.01 |
133
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.
 
131
  | GPT-4-Turbo | 9.38 | 9.00 | 9.19 |
132
  | Meta-Llama-3-70B-Instruct | 9.21 | 8.80 | 9.01 |
133
 
134
+ ### OpenLLM Leaderboard Manual Evaluation
135
+
136
+ | Model | ARC | Hellaswag | MMLU | TruthfulQA | Winogrande | GSM8K* |
137
+ | :---- | ---: | ------: | ---: | ---: | ---: | ---: |
138
+ | Smaug-Llama-3-70B-Instruct | 70.5 | 86.1 | 79.2 | 62.5 | 83.5 | 90.5 |
139
+ | Llama-3-70B-Instruct | 71.4 | 85.7 | 80.1 | 61.8 | 82.9 | 91.1 |
140
+
141
+ **GSM8K** The GSM8K numbers quoted here are computed using a recent release
142
+ of the [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/).
143
+ The commit used by the leaderboard has a significant issue that impacts models that
144
+ tend to use `:` in their responses due to a bug in the stop word configuration for
145
+ GSM8K. The issue is discussed in more detail at [GSM8K eval issue](http://fixme).
146
+ The score for both Llama-3 and this model are significantly different when evaluated
147
+ with the updated harness as the issue with stop words has been addressed.
148
+
149
+
150
  This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.