Felladrin commited on
Commit
01d04e8
1 Parent(s): fbe8884

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +143 -25
README.md CHANGED
@@ -1,33 +1,34 @@
1
  ---
2
- license: apache-2.0
3
- pipeline_tag: text-generation
4
  language:
5
- - en
 
6
  tags:
7
- - pretrained
8
  datasets:
9
- - Skylion007/openwebtext
10
- - c4
11
- - wikimedia/wikipedia
12
- - tiiuae/falcon-refinedweb
13
- - izumi-lab/open-text-books
14
- - togethercomputer/RedPajama-Data-V2
15
- - databricks/databricks-dolly-15k
16
- - euclaise/reddit-instruct-curated
17
- - CohereForAI/aya_dataset
 
18
  widget:
19
- - messages:
20
- - role: user
21
- content: Specs of a game about trolls and warriors in a fantasy world.
22
- - messages:
23
- - role: user
24
- content: Reducing waste generation is essential to...
25
- - messages:
26
- - role: user
27
- content: Water, planet, resource, future
28
- - messages:
29
- - role: user
30
- content: Background story of an RPG game about wizards and dragons in a sci-fi world. The story takes place in a...
 
31
  inference:
32
  parameters:
33
  max_new_tokens: 250
@@ -36,6 +37,109 @@ inference:
36
  top_p: 0.55
37
  top_k: 35
38
  repetition_penalty: 1.176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ---
40
 
41
  # Minueza-32M-Base
@@ -140,3 +244,17 @@ print(output[0]["generated_text"])
140
  ## License
141
 
142
  This model is licensed under the [Apache License 2.0](https://huggingface.co/Felladrin/Minueza-32M-Base/resolve/main/license.txt).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  language:
3
+ - en
4
+ license: apache-2.0
5
  tags:
6
+ - pretrained
7
  datasets:
8
+ - Skylion007/openwebtext
9
+ - c4
10
+ - wikimedia/wikipedia
11
+ - tiiuae/falcon-refinedweb
12
+ - izumi-lab/open-text-books
13
+ - togethercomputer/RedPajama-Data-V2
14
+ - databricks/databricks-dolly-15k
15
+ - euclaise/reddit-instruct-curated
16
+ - CohereForAI/aya_dataset
17
+ pipeline_tag: text-generation
18
  widget:
19
+ - messages:
20
+ - role: user
21
+ content: Specs of a game about trolls and warriors in a fantasy world.
22
+ - messages:
23
+ - role: user
24
+ content: Reducing waste generation is essential to...
25
+ - messages:
26
+ - role: user
27
+ content: Water, planet, resource, future
28
+ - messages:
29
+ - role: user
30
+ content: Background story of an RPG game about wizards and dragons in a sci-fi
31
+ world. The story takes place in a...
32
  inference:
33
  parameters:
34
  max_new_tokens: 250
 
37
  top_p: 0.55
38
  top_k: 35
39
  repetition_penalty: 1.176
40
+ model-index:
41
+ - name: Minueza-32M-Base
42
+ results:
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: AI2 Reasoning Challenge (25-Shot)
48
+ type: ai2_arc
49
+ config: ARC-Challenge
50
+ split: test
51
+ args:
52
+ num_few_shot: 25
53
+ metrics:
54
+ - type: acc_norm
55
+ value: 21.33
56
+ name: normalized accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Base
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: HellaSwag (10-Shot)
65
+ type: hellaswag
66
+ split: validation
67
+ args:
68
+ num_few_shot: 10
69
+ metrics:
70
+ - type: acc_norm
71
+ value: 26.39
72
+ name: normalized accuracy
73
+ source:
74
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Base
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: MMLU (5-Shot)
81
+ type: cais/mmlu
82
+ config: all
83
+ split: test
84
+ args:
85
+ num_few_shot: 5
86
+ metrics:
87
+ - type: acc
88
+ value: 24.8
89
+ name: accuracy
90
+ source:
91
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Base
92
+ name: Open LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: TruthfulQA (0-shot)
98
+ type: truthful_qa
99
+ config: multiple_choice
100
+ split: validation
101
+ args:
102
+ num_few_shot: 0
103
+ metrics:
104
+ - type: mc2
105
+ value: 47.45
106
+ source:
107
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Base
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: Winogrande (5-shot)
114
+ type: winogrande
115
+ config: winogrande_xl
116
+ split: validation
117
+ args:
118
+ num_few_shot: 5
119
+ metrics:
120
+ - type: acc
121
+ value: 53.2
122
+ name: accuracy
123
+ source:
124
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Base
125
+ name: Open LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: GSM8k (5-shot)
131
+ type: gsm8k
132
+ config: main
133
+ split: test
134
+ args:
135
+ num_few_shot: 5
136
+ metrics:
137
+ - type: acc
138
+ value: 0.38
139
+ name: accuracy
140
+ source:
141
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-Base
142
+ name: Open LLM Leaderboard
143
  ---
144
 
145
  # Minueza-32M-Base
 
244
  ## License
245
 
246
  This model is licensed under the [Apache License 2.0](https://huggingface.co/Felladrin/Minueza-32M-Base/resolve/main/license.txt).
247
+
248
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
249
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Minueza-32M-Base)
250
+
251
+ | Metric |Value|
252
+ |---------------------------------|----:|
253
+ |Avg. |28.92|
254
+ |AI2 Reasoning Challenge (25-Shot)|21.33|
255
+ |HellaSwag (10-Shot) |26.39|
256
+ |MMLU (5-Shot) |24.80|
257
+ |TruthfulQA (0-shot) |47.45|
258
+ |Winogrande (5-shot) |53.20|
259
+ |GSM8k (5-shot) | 0.38|
260
+