Adding Evaluation Results

#38
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -166,4 +166,17 @@ Thanks to everyone who have helped out one way or another (listed alphabetically
166
  - [Leo Gao](https://twitter.com/nabla_theta) for running zero shot evaluations for the baseline models for the table.
167
  - [Laurence Golding](https://github.com/researcher2/) for adding some features to the web demo.
168
  - [Aran Komatsuzaki](https://twitter.com/arankomatsuzaki) for advice with experiment design and writing the blog posts.
169
- - [Janko Prester](https://github.com/jprester/) for creating the web demo frontend.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  - [Leo Gao](https://twitter.com/nabla_theta) for running zero shot evaluations for the baseline models for the table.
167
  - [Laurence Golding](https://github.com/researcher2/) for adding some features to the web demo.
168
  - [Aran Komatsuzaki](https://twitter.com/arankomatsuzaki) for advice with experiment design and writing the blog posts.
169
+ - [Janko Prester](https://github.com/jprester/) for creating the web demo frontend.
170
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
171
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_EleutherAI__gpt-j-6b)
172
+
173
+ | Metric | Value |
174
+ |-----------------------|---------------------------|
175
+ | Avg. | 34.87 |
176
+ | ARC (25-shot) | 41.38 |
177
+ | HellaSwag (10-shot) | 67.54 |
178
+ | MMLU (5-shot) | 26.78 |
179
+ | TruthfulQA (0-shot) | 35.96 |
180
+ | Winogrande (5-shot) | 65.98 |
181
+ | GSM8K (5-shot) | 1.82 |
182
+ | DROP (3-shot) | 4.62 |