leaderboard-pr-bot commited on
Commit
8fe17d2
1 Parent(s): 1af2799

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -1,11 +1,114 @@
1
  ---
2
- license: other
3
  language:
4
  - da
5
  - sv
6
  - 'no'
7
  - en
8
  - is
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
  # Model description
11
  [AI Sweden](https://huggingface.co/AI-Sweden-Models/)
@@ -207,4 +310,17 @@ Following Mitchell et al. (2018), we provide a model card for GPT-SW3.
207
  - If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
208
  - Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
209
  - If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
210
- - Any other comments? No.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - da
4
  - sv
5
  - 'no'
6
  - en
7
  - is
8
+ license: other
9
+ model-index:
10
+ - name: gpt-sw3-40b
11
+ results:
12
+ - task:
13
+ type: text-generation
14
+ name: Text Generation
15
+ dataset:
16
+ name: AI2 Reasoning Challenge (25-Shot)
17
+ type: ai2_arc
18
+ config: ARC-Challenge
19
+ split: test
20
+ args:
21
+ num_few_shot: 25
22
+ metrics:
23
+ - type: acc_norm
24
+ value: 43.0
25
+ name: normalized accuracy
26
+ source:
27
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-40b
28
+ name: Open LLM Leaderboard
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: HellaSwag (10-Shot)
34
+ type: hellaswag
35
+ split: validation
36
+ args:
37
+ num_few_shot: 10
38
+ metrics:
39
+ - type: acc_norm
40
+ value: 72.37
41
+ name: normalized accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-40b
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: MMLU (5-Shot)
50
+ type: cais/mmlu
51
+ config: all
52
+ split: test
53
+ args:
54
+ num_few_shot: 5
55
+ metrics:
56
+ - type: acc
57
+ value: 34.97
58
+ name: accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-40b
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: TruthfulQA (0-shot)
67
+ type: truthful_qa
68
+ config: multiple_choice
69
+ split: validation
70
+ args:
71
+ num_few_shot: 0
72
+ metrics:
73
+ - type: mc2
74
+ value: 37.52
75
+ source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-40b
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: Winogrande (5-shot)
83
+ type: winogrande
84
+ config: winogrande_xl
85
+ split: validation
86
+ args:
87
+ num_few_shot: 5
88
+ metrics:
89
+ - type: acc
90
+ value: 67.96
91
+ name: accuracy
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-40b
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: GSM8k (5-shot)
100
+ type: gsm8k
101
+ config: main
102
+ split: test
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 4.7
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-40b
111
+ name: Open LLM Leaderboard
112
  ---
113
  # Model description
114
  [AI Sweden](https://huggingface.co/AI-Sweden-Models/)
 
310
  - If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
311
  - Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
312
  - If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
313
+ - Any other comments? No.
314
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
315
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_AI-Sweden-Models__gpt-sw3-40b)
316
+
317
+ | Metric |Value|
318
+ |---------------------------------|----:|
319
+ |Avg. |43.42|
320
+ |AI2 Reasoning Challenge (25-Shot)|43.00|
321
+ |HellaSwag (10-Shot) |72.37|
322
+ |MMLU (5-Shot) |34.97|
323
+ |TruthfulQA (0-shot) |37.52|
324
+ |Winogrande (5-shot) |67.96|
325
+ |GSM8k (5-shot) | 4.70|
326
+