leaderboard-pr-bot commited on
Commit
a0c19f1
1 Parent(s): dec16b4

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +124 -22
README.md CHANGED
@@ -1,9 +1,13 @@
1
  ---
 
 
2
  license: apache-2.0
3
- base_model: BEE-spoke-data/smol_llama-220M-GQA
4
  tags:
5
  - edu
6
  - continual pretraining
 
 
 
7
  metrics:
8
  - accuracy
9
  inference:
@@ -20,43 +24,128 @@ widget:
20
  example_title: El Microondas
21
  - text: Kennesaw State University is a public
22
  example_title: Kennesaw State University
23
- - text: >-
24
- Bungie Studios is an American video game developer. They are most famous for
25
- developing the award winning Halo series of video games. They also made
26
- Destiny. The studio was founded
27
  example_title: Bungie
28
  - text: The Mona Lisa is a world-renowned painting created by
29
  example_title: Mona Lisa
30
- - text: >-
31
- The Harry Potter series, written by J.K. Rowling, begins with the book
32
- titled
33
  example_title: Harry Potter Series
34
- - text: >-
35
- Question: I have cities, but no houses. I have mountains, but no trees. I
36
  have water, but no fish. What am I?
37
 
38
- Answer:
39
  example_title: Riddle
40
  - text: The process of photosynthesis involves the conversion of
41
  example_title: Photosynthesis
42
- - text: >-
43
- Jane went to the store to buy some groceries. She picked up apples, oranges,
44
  and a loaf of bread. When she got home, she realized she forgot
45
  example_title: Story Continuation
46
- - text: >-
47
- Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
48
- another train leaves Station B at 10:00 AM and travels at 80 mph, when will
49
  they meet if the distance between the stations is 300 miles?
50
 
51
- To determine
52
  example_title: Math Problem
53
  - text: In the context of computer programming, an algorithm is
54
  example_title: Algorithm Definition
55
  pipeline_tag: text-generation
56
- datasets:
57
- - HuggingFaceFW/fineweb-edu
58
- language:
59
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ---
61
 
62
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -167,4 +256,17 @@ The following hyperparameters were used during training:
167
  - Transformers 4.41.1
168
  - Pytorch 2.3.1+cu118
169
  - Datasets 2.19.1
170
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
 
5
  tags:
6
  - edu
7
  - continual pretraining
8
+ base_model: BEE-spoke-data/smol_llama-220M-GQA
9
+ datasets:
10
+ - HuggingFaceFW/fineweb-edu
11
  metrics:
12
  - accuracy
13
  inference:
 
24
  example_title: El Microondas
25
  - text: Kennesaw State University is a public
26
  example_title: Kennesaw State University
27
+ - text: Bungie Studios is an American video game developer. They are most famous for
28
+ developing the award winning Halo series of video games. They also made Destiny.
29
+ The studio was founded
 
30
  example_title: Bungie
31
  - text: The Mona Lisa is a world-renowned painting created by
32
  example_title: Mona Lisa
33
+ - text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
 
 
34
  example_title: Harry Potter Series
35
+ - text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
 
36
  have water, but no fish. What am I?
37
 
38
+ Answer:'
39
  example_title: Riddle
40
  - text: The process of photosynthesis involves the conversion of
41
  example_title: Photosynthesis
42
+ - text: Jane went to the store to buy some groceries. She picked up apples, oranges,
 
43
  and a loaf of bread. When she got home, she realized she forgot
44
  example_title: Story Continuation
45
+ - text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
46
+ and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
 
47
  they meet if the distance between the stations is 300 miles?
48
 
49
+ To determine'
50
  example_title: Math Problem
51
  - text: In the context of computer programming, an algorithm is
52
  example_title: Algorithm Definition
53
  pipeline_tag: text-generation
54
+ model-index:
55
+ - name: smol_llama-220M-GQA-fineweb_edu
56
+ results:
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: IFEval (0-Shot)
62
+ type: HuggingFaceH4/ifeval
63
+ args:
64
+ num_few_shot: 0
65
+ metrics:
66
+ - type: inst_level_strict_acc and prompt_level_strict_acc
67
+ value: 19.88
68
+ name: strict accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: BBH (3-Shot)
77
+ type: BBH
78
+ args:
79
+ num_few_shot: 3
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 2.31
83
+ name: normalized accuracy
84
+ source:
85
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MATH Lvl 5 (4-Shot)
92
+ type: hendrycks/competition_math
93
+ args:
94
+ num_few_shot: 4
95
+ metrics:
96
+ - type: exact_match
97
+ value: 0.0
98
+ name: exact match
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: GPQA (0-shot)
107
+ type: Idavidrein/gpqa
108
+ args:
109
+ num_few_shot: 0
110
+ metrics:
111
+ - type: acc_norm
112
+ value: 1.23
113
+ name: acc_norm
114
+ source:
115
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
116
+ name: Open LLM Leaderboard
117
+ - task:
118
+ type: text-generation
119
+ name: Text Generation
120
+ dataset:
121
+ name: MuSR (0-shot)
122
+ type: TAUR-Lab/MuSR
123
+ args:
124
+ num_few_shot: 0
125
+ metrics:
126
+ - type: acc_norm
127
+ value: 14.26
128
+ name: acc_norm
129
+ source:
130
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: MMLU-PRO (5-shot)
137
+ type: TIGER-Lab/MMLU-Pro
138
+ config: main
139
+ split: test
140
+ args:
141
+ num_few_shot: 5
142
+ metrics:
143
+ - type: acc
144
+ value: 1.41
145
+ name: accuracy
146
+ source:
147
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
148
+ name: Open LLM Leaderboard
149
  ---
150
 
151
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
256
  - Transformers 4.41.1
257
  - Pytorch 2.3.1+cu118
258
  - Datasets 2.19.1
259
+ - Tokenizers 0.19.1
260
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
261
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-220M-GQA-fineweb_edu)
262
+
263
+ | Metric |Value|
264
+ |-------------------|----:|
265
+ |Avg. | 6.52|
266
+ |IFEval (0-Shot) |19.88|
267
+ |BBH (3-Shot) | 2.31|
268
+ |MATH Lvl 5 (4-Shot)| 0.00|
269
+ |GPQA (0-shot) | 1.23|
270
+ |MuSR (0-shot) |14.26|
271
+ |MMLU-PRO (5-shot) | 1.41|
272
+