leaderboard-pr-bot commited on
Commit
595938d
1 Parent(s): 05d80f7

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +123 -7
README.md CHANGED
@@ -1,18 +1,121 @@
1
  ---
2
- license: other
3
- datasets:
4
- - databricks/databricks-dolly-15k
5
- - laion/OIG
6
- - OpenAssistant/oasst1
7
  language:
8
  - da
9
  - sv
10
  - en
11
  - 'no'
12
  - is
 
 
 
 
 
13
  pipeline_tag: conversational
14
  widget:
15
- - text: "Jens Peter Hansen kommer fra Danmark"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
  # Model description
18
  [AI Sweden](https://huggingface.co/AI-Sweden-Models/)
@@ -274,4 +377,17 @@ Following Mitchell et al. (2018), we provide a model card for GPT-SW3.
274
  - If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
275
  - Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
276
  - If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
277
- - Any other comments? No.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
2
  language:
3
  - da
4
  - sv
5
  - en
6
  - 'no'
7
  - is
8
+ license: other
9
+ datasets:
10
+ - databricks/databricks-dolly-15k
11
+ - laion/OIG
12
+ - OpenAssistant/oasst1
13
  pipeline_tag: conversational
14
  widget:
15
+ - text: Jens Peter Hansen kommer fra Danmark
16
+ model-index:
17
+ - name: gpt-sw3-356m-instruct
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: AI2 Reasoning Challenge (25-Shot)
24
+ type: ai2_arc
25
+ config: ARC-Challenge
26
+ split: test
27
+ args:
28
+ num_few_shot: 25
29
+ metrics:
30
+ - type: acc_norm
31
+ value: 26.96
32
+ name: normalized accuracy
33
+ source:
34
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
35
+ name: Open LLM Leaderboard
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: HellaSwag (10-Shot)
41
+ type: hellaswag
42
+ split: validation
43
+ args:
44
+ num_few_shot: 10
45
+ metrics:
46
+ - type: acc_norm
47
+ value: 38.01
48
+ name: normalized accuracy
49
+ source:
50
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
51
+ name: Open LLM Leaderboard
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: MMLU (5-Shot)
57
+ type: cais/mmlu
58
+ config: all
59
+ split: test
60
+ args:
61
+ num_few_shot: 5
62
+ metrics:
63
+ - type: acc
64
+ value: 25.53
65
+ name: accuracy
66
+ source:
67
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: TruthfulQA (0-shot)
74
+ type: truthful_qa
75
+ config: multiple_choice
76
+ split: validation
77
+ args:
78
+ num_few_shot: 0
79
+ metrics:
80
+ - type: mc2
81
+ value: 40.74
82
+ source:
83
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: Winogrande (5-shot)
90
+ type: winogrande
91
+ config: winogrande_xl
92
+ split: validation
93
+ args:
94
+ num_few_shot: 5
95
+ metrics:
96
+ - type: acc
97
+ value: 52.57
98
+ name: accuracy
99
+ source:
100
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: GSM8k (5-shot)
107
+ type: gsm8k
108
+ config: main
109
+ split: test
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 1.74
115
+ name: accuracy
116
+ source:
117
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
118
+ name: Open LLM Leaderboard
119
  ---
120
  # Model description
121
  [AI Sweden](https://huggingface.co/AI-Sweden-Models/)
 
377
  - If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
378
  - Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
379
  - If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
380
+ - Any other comments? No.
381
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
382
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_AI-Sweden-Models__gpt-sw3-356m-instruct)
383
+
384
+ | Metric |Value|
385
+ |---------------------------------|----:|
386
+ |Avg. |30.93|
387
+ |AI2 Reasoning Challenge (25-Shot)|26.96|
388
+ |HellaSwag (10-Shot) |38.01|
389
+ |MMLU (5-Shot) |25.53|
390
+ |TruthfulQA (0-shot) |40.74|
391
+ |Winogrande (5-shot) |52.57|
392
+ |GSM8k (5-shot) | 1.74|
393
+