pszemraj leaderboard-pr-bot commited on
Commit
f5478a0
1 Parent(s): bba037e

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (f74de0fad37df5fb0839f07f3593a248afbe31cb)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +129 -15
README.md CHANGED
@@ -1,13 +1,17 @@
1
  ---
 
 
2
  license: apache-2.0
3
- base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
4
  tags:
5
  - bees
6
  - bzz
7
  - honey
8
  - oprah winfrey
 
 
9
  metrics:
10
  - accuracy
 
11
  inference:
12
  parameters:
13
  max_new_tokens: 64
@@ -31,27 +35,124 @@ widget:
31
  example_title: Beekeeping PPE
32
  - text: The term "robbing" in beekeeping refers to the act of
33
  example_title: Robbing in Beekeeping
34
- - text: |-
35
- Question: What's the primary function of drone bees in a hive?
36
- Answer:
37
  example_title: Role of Drone Bees
38
  - text: To harvest honey from a hive, beekeepers often use a device known as a
39
  example_title: Honey Harvesting Device
40
- - text: >-
41
- Problem: You have a hive that produces 60 pounds of honey per year. You
42
- decide to split the hive into two. Assuming each hive now produces at a 70%
43
- rate compared to before, how much honey will you get from both hives next
44
- year?
45
 
46
- To calculate
47
  example_title: Beekeeping Math Problem
48
  - text: In beekeeping, "swarming" is the process where
49
  example_title: Swarming
50
  pipeline_tag: text-generation
51
- datasets:
52
- - BEE-spoke-data/bees-internal
53
- language:
54
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ---
56
 
57
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -111,4 +212,17 @@ The following hyperparameters were used during training:
111
  - Transformers 4.36.2
112
  - Pytorch 2.1.0
113
  - Datasets 2.16.1
114
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
 
5
  tags:
6
  - bees
7
  - bzz
8
  - honey
9
  - oprah winfrey
10
+ datasets:
11
+ - BEE-spoke-data/bees-internal
12
  metrics:
13
  - accuracy
14
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
15
  inference:
16
  parameters:
17
  max_new_tokens: 64
 
35
  example_title: Beekeeping PPE
36
  - text: The term "robbing" in beekeeping refers to the act of
37
  example_title: Robbing in Beekeeping
38
+ - text: 'Question: What''s the primary function of drone bees in a hive?
39
+
40
+ Answer:'
41
  example_title: Role of Drone Bees
42
  - text: To harvest honey from a hive, beekeepers often use a device known as a
43
  example_title: Honey Harvesting Device
44
+ - text: 'Problem: You have a hive that produces 60 pounds of honey per year. You decide
45
+ to split the hive into two. Assuming each hive now produces at a 70% rate compared
46
+ to before, how much honey will you get from both hives next year?
 
 
47
 
48
+ To calculate'
49
  example_title: Beekeeping Math Problem
50
  - text: In beekeeping, "swarming" is the process where
51
  example_title: Swarming
52
  pipeline_tag: text-generation
53
+ model-index:
54
+ - name: TinyLlama-3T-1.1bee
55
+ results:
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: AI2 Reasoning Challenge (25-Shot)
61
+ type: ai2_arc
62
+ config: ARC-Challenge
63
+ split: test
64
+ args:
65
+ num_few_shot: 25
66
+ metrics:
67
+ - type: acc_norm
68
+ value: 33.79
69
+ name: normalized accuracy
70
+ source:
71
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/TinyLlama-3T-1.1bee
72
+ name: Open LLM Leaderboard
73
+ - task:
74
+ type: text-generation
75
+ name: Text Generation
76
+ dataset:
77
+ name: HellaSwag (10-Shot)
78
+ type: hellaswag
79
+ split: validation
80
+ args:
81
+ num_few_shot: 10
82
+ metrics:
83
+ - type: acc_norm
84
+ value: 60.29
85
+ name: normalized accuracy
86
+ source:
87
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/TinyLlama-3T-1.1bee
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: MMLU (5-Shot)
94
+ type: cais/mmlu
95
+ config: all
96
+ split: test
97
+ args:
98
+ num_few_shot: 5
99
+ metrics:
100
+ - type: acc
101
+ value: 25.86
102
+ name: accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/TinyLlama-3T-1.1bee
105
+ name: Open LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: TruthfulQA (0-shot)
111
+ type: truthful_qa
112
+ config: multiple_choice
113
+ split: validation
114
+ args:
115
+ num_few_shot: 0
116
+ metrics:
117
+ - type: mc2
118
+ value: 38.13
119
+ source:
120
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/TinyLlama-3T-1.1bee
121
+ name: Open LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: Winogrande (5-shot)
127
+ type: winogrande
128
+ config: winogrande_xl
129
+ split: validation
130
+ args:
131
+ num_few_shot: 5
132
+ metrics:
133
+ - type: acc
134
+ value: 60.22
135
+ name: accuracy
136
+ source:
137
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/TinyLlama-3T-1.1bee
138
+ name: Open LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: GSM8k (5-shot)
144
+ type: gsm8k
145
+ config: main
146
+ split: test
147
+ args:
148
+ num_few_shot: 5
149
+ metrics:
150
+ - type: acc
151
+ value: 0.45
152
+ name: accuracy
153
+ source:
154
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/TinyLlama-3T-1.1bee
155
+ name: Open LLM Leaderboard
156
  ---
157
 
158
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
212
  - Transformers 4.36.2
213
  - Pytorch 2.1.0
214
  - Datasets 2.16.1
215
+ - Tokenizers 0.15.0
216
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
217
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__TinyLlama-3T-1.1bee)
218
+
219
+ | Metric |Value|
220
+ |---------------------------------|----:|
221
+ |Avg. |36.46|
222
+ |AI2 Reasoning Challenge (25-Shot)|33.79|
223
+ |HellaSwag (10-Shot) |60.29|
224
+ |MMLU (5-Shot) |25.86|
225
+ |TruthfulQA (0-shot) |38.13|
226
+ |Winogrande (5-shot) |60.22|
227
+ |GSM8k (5-shot) | 0.45|
228
+