leaderboard-pr-bot commited on
Commit
3e12abb
1 Parent(s): e7bbf8b

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -2,6 +2,109 @@
2
  license: apache-2.0
3
  datasets:
4
  - KnutJaegersberg/Auton
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
 
@@ -62,4 +165,17 @@ Essay Example:
62
  Write the original essay for the following summary: A sleeveless dress with a round neck is a great day dress and can be worn for both formal and casual occasions. It has a sweetheart neckline and is made from 100% silk. The dress has a chiffon overlay that covers up the miniskirt and still adds to the class of the dress.
63
  ### Response:
64
  Sleeveless Dress with a Round Neck Essay Casual wear, day dress, more formal dress The garment is a sleeveless dress with a round neck and has a sweetheart neckline. It has sheer detail on the neckline and on the back. These aspects of design make it suitable to be worn for day and evening occasions. Its great detail and chicness make it suitable for more formal events, ....
65
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  datasets:
4
  - KnutJaegersberg/Auton
5
+ model-index:
6
+ - name: Walter-SOLAR-11B
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ name: Text Generation
11
+ dataset:
12
+ name: AI2 Reasoning Challenge (25-Shot)
13
+ type: ai2_arc
14
+ config: ARC-Challenge
15
+ split: test
16
+ args:
17
+ num_few_shot: 25
18
+ metrics:
19
+ - type: acc_norm
20
+ value: 60.41
21
+ name: normalized accuracy
22
+ source:
23
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KnutJaegersberg/Walter-SOLAR-11B
24
+ name: Open LLM Leaderboard
25
+ - task:
26
+ type: text-generation
27
+ name: Text Generation
28
+ dataset:
29
+ name: HellaSwag (10-Shot)
30
+ type: hellaswag
31
+ split: validation
32
+ args:
33
+ num_few_shot: 10
34
+ metrics:
35
+ - type: acc_norm
36
+ value: 84.86
37
+ name: normalized accuracy
38
+ source:
39
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KnutJaegersberg/Walter-SOLAR-11B
40
+ name: Open LLM Leaderboard
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: MMLU (5-Shot)
46
+ type: cais/mmlu
47
+ config: all
48
+ split: test
49
+ args:
50
+ num_few_shot: 5
51
+ metrics:
52
+ - type: acc
53
+ value: 64.99
54
+ name: accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KnutJaegersberg/Walter-SOLAR-11B
57
+ name: Open LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: TruthfulQA (0-shot)
63
+ type: truthful_qa
64
+ config: multiple_choice
65
+ split: validation
66
+ args:
67
+ num_few_shot: 0
68
+ metrics:
69
+ - type: mc2
70
+ value: 44.88
71
+ source:
72
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KnutJaegersberg/Walter-SOLAR-11B
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: Winogrande (5-shot)
79
+ type: winogrande
80
+ config: winogrande_xl
81
+ split: validation
82
+ args:
83
+ num_few_shot: 5
84
+ metrics:
85
+ - type: acc
86
+ value: 79.56
87
+ name: accuracy
88
+ source:
89
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KnutJaegersberg/Walter-SOLAR-11B
90
+ name: Open LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: GSM8k (5-shot)
96
+ type: gsm8k
97
+ config: main
98
+ split: test
99
+ args:
100
+ num_few_shot: 5
101
+ metrics:
102
+ - type: acc
103
+ value: 0.99
104
+ name: accuracy
105
+ source:
106
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KnutJaegersberg/Walter-SOLAR-11B
107
+ name: Open LLM Leaderboard
108
  ---
109
 
110
 
 
165
  Write the original essay for the following summary: A sleeveless dress with a round neck is a great day dress and can be worn for both formal and casual occasions. It has a sweetheart neckline and is made from 100% silk. The dress has a chiffon overlay that covers up the miniskirt and still adds to the class of the dress.
166
  ### Response:
167
  Sleeveless Dress with a Round Neck Essay Casual wear, day dress, more formal dress The garment is a sleeveless dress with a round neck and has a sweetheart neckline. It has sheer detail on the neckline and on the back. These aspects of design make it suitable to be worn for day and evening occasions. Its great detail and chicness make it suitable for more formal events, ....
168
+ ```
169
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
170
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_KnutJaegersberg__Walter-SOLAR-11B)
171
+
172
+ | Metric |Value|
173
+ |---------------------------------|----:|
174
+ |Avg. |55.95|
175
+ |AI2 Reasoning Challenge (25-Shot)|60.41|
176
+ |HellaSwag (10-Shot) |84.86|
177
+ |MMLU (5-Shot) |64.99|
178
+ |TruthfulQA (0-shot) |44.88|
179
+ |Winogrande (5-shot) |79.56|
180
+ |GSM8k (5-shot) | 0.99|
181
+