Adding Evaluation Results

#1
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -79,4 +79,17 @@ A small 81M param (total) decoder model, enabled through tying the input/output
79
  - slightly larger 101M param GQA pretrained version: [here](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA)
80
  - For the chat version of this model, please [see here](https://youtu.be/dQw4w9WgXcQ?si=3ePIqrY1dw94KMu4)
81
 
82
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  - slightly larger 101M param GQA pretrained version: [here](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA)
80
  - For the chat version of this model, please [see here](https://youtu.be/dQw4w9WgXcQ?si=3ePIqrY1dw94KMu4)
81
 
82
+ ---
83
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
84
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-81M-tied)
85
+
86
+ | Metric | Value |
87
+ |-----------------------|---------------------------|
88
+ | Avg. | 24.52 |
89
+ | ARC (25-shot) | 22.18 |
90
+ | HellaSwag (10-shot) | 29.33 |
91
+ | MMLU (5-shot) | 24.06 |
92
+ | TruthfulQA (0-shot) | 43.97 |
93
+ | Winogrande (5-shot) | 49.25 |
94
+ | GSM8K (5-shot) | 0.23 |
95
+ | DROP (3-shot) | 2.64 |