leaderboard-pr-bot commited on
Commit
348a4e9
1 Parent(s): 995200d

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +120 -7
README.md CHANGED
@@ -1,4 +1,10 @@
1
  ---
 
 
 
 
 
 
2
  pipeline_tag: text-generation
3
  inference:
4
  parameters:
@@ -8,12 +14,6 @@ widget:
8
  - text: 'def print_hello_world():'
9
  example_title: Hello world
10
  group: Python
11
- datasets:
12
- - bigcode/the-stack-v2-train
13
- license: bigcode-openrail-m
14
- library_name: transformers
15
- tags:
16
- - code
17
  model-index:
18
  - name: starcoder2-15b
19
  results:
@@ -65,6 +65,106 @@ model-index:
65
  metrics:
66
  - type: edit-smiliarity
67
  value: 74.08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ---
69
 
70
  # StarCoder2
@@ -212,4 +312,17 @@ The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can
212
  archivePrefix={arXiv},
213
  primaryClass={cs.SE}
214
  }
215
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: bigcode-openrail-m
3
+ library_name: transformers
4
+ tags:
5
+ - code
6
+ datasets:
7
+ - bigcode/the-stack-v2-train
8
  pipeline_tag: text-generation
9
  inference:
10
  parameters:
 
14
  - text: 'def print_hello_world():'
15
  example_title: Hello world
16
  group: Python
 
 
 
 
 
 
17
  model-index:
18
  - name: starcoder2-15b
19
  results:
 
65
  metrics:
66
  - type: edit-smiliarity
67
  value: 74.08
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: AI2 Reasoning Challenge (25-Shot)
73
+ type: ai2_arc
74
+ config: ARC-Challenge
75
+ split: test
76
+ args:
77
+ num_few_shot: 25
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 47.35
81
+ name: normalized accuracy
82
+ source:
83
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bigcode/starcoder2-15b
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: HellaSwag (10-Shot)
90
+ type: hellaswag
91
+ split: validation
92
+ args:
93
+ num_few_shot: 10
94
+ metrics:
95
+ - type: acc_norm
96
+ value: 64.09
97
+ name: normalized accuracy
98
+ source:
99
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bigcode/starcoder2-15b
100
+ name: Open LLM Leaderboard
101
+ - task:
102
+ type: text-generation
103
+ name: Text Generation
104
+ dataset:
105
+ name: MMLU (5-Shot)
106
+ type: cais/mmlu
107
+ config: all
108
+ split: test
109
+ args:
110
+ num_few_shot: 5
111
+ metrics:
112
+ - type: acc
113
+ value: 51.35
114
+ name: accuracy
115
+ source:
116
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bigcode/starcoder2-15b
117
+ name: Open LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: TruthfulQA (0-shot)
123
+ type: truthful_qa
124
+ config: multiple_choice
125
+ split: validation
126
+ args:
127
+ num_few_shot: 0
128
+ metrics:
129
+ - type: mc2
130
+ value: 37.87
131
+ source:
132
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bigcode/starcoder2-15b
133
+ name: Open LLM Leaderboard
134
+ - task:
135
+ type: text-generation
136
+ name: Text Generation
137
+ dataset:
138
+ name: Winogrande (5-shot)
139
+ type: winogrande
140
+ config: winogrande_xl
141
+ split: validation
142
+ args:
143
+ num_few_shot: 5
144
+ metrics:
145
+ - type: acc
146
+ value: 63.85
147
+ name: accuracy
148
+ source:
149
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bigcode/starcoder2-15b
150
+ name: Open LLM Leaderboard
151
+ - task:
152
+ type: text-generation
153
+ name: Text Generation
154
+ dataset:
155
+ name: GSM8k (5-shot)
156
+ type: gsm8k
157
+ config: main
158
+ split: test
159
+ args:
160
+ num_few_shot: 5
161
+ metrics:
162
+ - type: acc
163
+ value: 52.24
164
+ name: accuracy
165
+ source:
166
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bigcode/starcoder2-15b
167
+ name: Open LLM Leaderboard
168
  ---
169
 
170
  # StarCoder2
 
312
  archivePrefix={arXiv},
313
  primaryClass={cs.SE}
314
  }
315
+ ```
316
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
317
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_bigcode__starcoder2-15b)
318
+
319
+ | Metric |Value|
320
+ |---------------------------------|----:|
321
+ |Avg. |52.79|
322
+ |AI2 Reasoning Challenge (25-Shot)|47.35|
323
+ |HellaSwag (10-Shot) |64.09|
324
+ |MMLU (5-Shot) |51.35|
325
+ |TruthfulQA (0-shot) |37.87|
326
+ |Winogrande (5-shot) |63.85|
327
+ |GSM8k (5-shot) |52.24|
328
+