Adding Evaluation Results
#7
by
leaderboard-pr-bot
- opened
README.md
CHANGED
@@ -1,18 +1,121 @@
|
|
1 |
---
|
2 |
-
license: other
|
3 |
-
datasets:
|
4 |
-
- databricks/databricks-dolly-15k
|
5 |
-
- laion/OIG
|
6 |
-
- OpenAssistant/oasst1
|
7 |
language:
|
8 |
- da
|
9 |
- sv
|
10 |
- en
|
11 |
- 'no'
|
12 |
- is
|
|
|
|
|
|
|
|
|
|
|
13 |
pipeline_tag: conversational
|
14 |
widget:
|
15 |
-
- text:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
---
|
17 |
# Model description
|
18 |
[AI Sweden](https://huggingface.co/AI-Sweden-Models/)
|
@@ -274,4 +377,17 @@ Following Mitchell et al. (2018), we provide a model card for GPT-SW3.
|
|
274 |
- If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
|
275 |
- Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
|
276 |
- If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
|
277 |
-
- Any other comments? No.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
2 |
language:
|
3 |
- da
|
4 |
- sv
|
5 |
- en
|
6 |
- 'no'
|
7 |
- is
|
8 |
+
license: other
|
9 |
+
datasets:
|
10 |
+
- databricks/databricks-dolly-15k
|
11 |
+
- laion/OIG
|
12 |
+
- OpenAssistant/oasst1
|
13 |
pipeline_tag: conversational
|
14 |
widget:
|
15 |
+
- text: Jens Peter Hansen kommer fra Danmark
|
16 |
+
model-index:
|
17 |
+
- name: gpt-sw3-356m-instruct
|
18 |
+
results:
|
19 |
+
- task:
|
20 |
+
type: text-generation
|
21 |
+
name: Text Generation
|
22 |
+
dataset:
|
23 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
24 |
+
type: ai2_arc
|
25 |
+
config: ARC-Challenge
|
26 |
+
split: test
|
27 |
+
args:
|
28 |
+
num_few_shot: 25
|
29 |
+
metrics:
|
30 |
+
- type: acc_norm
|
31 |
+
value: 26.96
|
32 |
+
name: normalized accuracy
|
33 |
+
source:
|
34 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
|
35 |
+
name: Open LLM Leaderboard
|
36 |
+
- task:
|
37 |
+
type: text-generation
|
38 |
+
name: Text Generation
|
39 |
+
dataset:
|
40 |
+
name: HellaSwag (10-Shot)
|
41 |
+
type: hellaswag
|
42 |
+
split: validation
|
43 |
+
args:
|
44 |
+
num_few_shot: 10
|
45 |
+
metrics:
|
46 |
+
- type: acc_norm
|
47 |
+
value: 38.01
|
48 |
+
name: normalized accuracy
|
49 |
+
source:
|
50 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
|
51 |
+
name: Open LLM Leaderboard
|
52 |
+
- task:
|
53 |
+
type: text-generation
|
54 |
+
name: Text Generation
|
55 |
+
dataset:
|
56 |
+
name: MMLU (5-Shot)
|
57 |
+
type: cais/mmlu
|
58 |
+
config: all
|
59 |
+
split: test
|
60 |
+
args:
|
61 |
+
num_few_shot: 5
|
62 |
+
metrics:
|
63 |
+
- type: acc
|
64 |
+
value: 25.53
|
65 |
+
name: accuracy
|
66 |
+
source:
|
67 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
|
68 |
+
name: Open LLM Leaderboard
|
69 |
+
- task:
|
70 |
+
type: text-generation
|
71 |
+
name: Text Generation
|
72 |
+
dataset:
|
73 |
+
name: TruthfulQA (0-shot)
|
74 |
+
type: truthful_qa
|
75 |
+
config: multiple_choice
|
76 |
+
split: validation
|
77 |
+
args:
|
78 |
+
num_few_shot: 0
|
79 |
+
metrics:
|
80 |
+
- type: mc2
|
81 |
+
value: 40.74
|
82 |
+
source:
|
83 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
|
84 |
+
name: Open LLM Leaderboard
|
85 |
+
- task:
|
86 |
+
type: text-generation
|
87 |
+
name: Text Generation
|
88 |
+
dataset:
|
89 |
+
name: Winogrande (5-shot)
|
90 |
+
type: winogrande
|
91 |
+
config: winogrande_xl
|
92 |
+
split: validation
|
93 |
+
args:
|
94 |
+
num_few_shot: 5
|
95 |
+
metrics:
|
96 |
+
- type: acc
|
97 |
+
value: 52.57
|
98 |
+
name: accuracy
|
99 |
+
source:
|
100 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
|
101 |
+
name: Open LLM Leaderboard
|
102 |
+
- task:
|
103 |
+
type: text-generation
|
104 |
+
name: Text Generation
|
105 |
+
dataset:
|
106 |
+
name: GSM8k (5-shot)
|
107 |
+
type: gsm8k
|
108 |
+
config: main
|
109 |
+
split: test
|
110 |
+
args:
|
111 |
+
num_few_shot: 5
|
112 |
+
metrics:
|
113 |
+
- type: acc
|
114 |
+
value: 1.74
|
115 |
+
name: accuracy
|
116 |
+
source:
|
117 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-356m-instruct
|
118 |
+
name: Open LLM Leaderboard
|
119 |
---
|
120 |
# Model description
|
121 |
[AI Sweden](https://huggingface.co/AI-Sweden-Models/)
|
|
|
377 |
- If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
|
378 |
- Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
|
379 |
- If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
|
380 |
+
- Any other comments? No.
|
381 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
382 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_AI-Sweden-Models__gpt-sw3-356m-instruct)
|
383 |
+
|
384 |
+
| Metric |Value|
|
385 |
+
|---------------------------------|----:|
|
386 |
+
|Avg. |30.93|
|
387 |
+
|AI2 Reasoning Challenge (25-Shot)|26.96|
|
388 |
+
|HellaSwag (10-Shot) |38.01|
|
389 |
+
|MMLU (5-Shot) |25.53|
|
390 |
+
|TruthfulQA (0-shot) |40.74|
|
391 |
+
|Winogrande (5-shot) |52.57|
|
392 |
+
|GSM8k (5-shot) | 1.74|
|
393 |
+
|