Adding Evaluation Results

#3
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -200,4 +200,17 @@ Following Mitchell et al. (2018), we provide a model card for GPT-SW3.
200
  - If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
201
  - Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
202
  - If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
203
- - Any other comments? No.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
  - If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
201
  - Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
202
  - If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
203
+ - Any other comments? No.
204
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
205
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_AI-Sweden-Models__gpt-sw3-1.3b)
206
+
207
+ | Metric | Value |
208
+ |-----------------------|---------------------------|
209
+ | Avg. | 29.99 |
210
+ | ARC (25-shot) | 30.38 |
211
+ | HellaSwag (10-shot) | 50.4 |
212
+ | MMLU (5-shot) | 26.14 |
213
+ | TruthfulQA (0-shot) | 39.97 |
214
+ | Winogrande (5-shot) | 58.88 |
215
+ | GSM8K (5-shot) | 0.08 |
216
+ | DROP (3-shot) | 4.08 |