pszemraj commited on
Commit
096e543
1 Parent(s): b9229f4

Upload 81m_tied.md

Browse files
Files changed (1) hide show
  1. smol_llama-81M-tied-evals/81m_tied.md +150 -0
smol_llama-81M-tied-evals/81m_tied.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-81M-tied,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 0, batch_size: 64
2
+ | Task |Version| Metric | Value | |Stderr|
3
+ |--------------|------:|--------|------:|---|-----:|
4
+ |arc_easy | 0|acc | 0.4162|± |0.0101|
5
+ | | |acc_norm| 0.3885|± |0.0100|
6
+ |boolq | 1|acc | 0.5832|± |0.0086|
7
+ |lambada_openai| 0|ppl |79.4522|± |3.1355|
8
+ | | |acc | 0.2523|± |0.0061|
9
+ |openbookqa | 0|acc | 0.1540|± |0.0162|
10
+ | | |acc_norm| 0.2780|± |0.0201|
11
+ |piqa | 0|acc | 0.6050|± |0.0114|
12
+ | | |acc_norm| 0.5898|± |0.0115|
13
+ |winogrande | 0|acc | 0.5272|± |0.0140|
14
+
15
+ hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-81M-tied,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 25, batch_size: 64
16
+ | Task |Version| Metric |Value | |Stderr|
17
+ |-------------|------:|--------|-----:|---|-----:|
18
+ |arc_challenge| 0|acc |0.1672|± |0.0109|
19
+ | | |acc_norm|0.2218|± |0.0121|
20
+
21
+ hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-81M-tied,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 10, batch_size: 64
22
+ | Task |Version| Metric |Value | |Stderr|
23
+ |---------|------:|--------|-----:|---|-----:|
24
+ |hellaswag| 0|acc |0.2769|± |0.0045|
25
+ | | |acc_norm|0.2923|± |0.0045|
26
+
27
+ hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-81M-tied,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 0, batch_size: 64
28
+ | Task |Version|Metric|Value | |Stderr|
29
+ |-------------|------:|------|-----:|---|-----:|
30
+ |truthfulqa_mc| 1|mc1 |0.2424|± |0.0150|
31
+ | | |mc2 |0.4353|± |0.0152|
32
+
33
+ hf-causal-experimental (pretrained=BEE-spoke-data/smol_llama-81M-tied,trust_remote_code=True,dtype=float), limit: None, provide_description: False, num_fewshot: 5, batch_size: 64
34
+ | Task |Version| Metric |Value | |Stderr|
35
+ |-------------------------------------------------|------:|--------|-----:|---|-----:|
36
+ |hendrycksTest-abstract_algebra | 1|acc |0.2200|± |0.0416|
37
+ | | |acc_norm|0.2200|± |0.0416|
38
+ |hendrycksTest-anatomy | 1|acc |0.2741|± |0.0385|
39
+ | | |acc_norm|0.2741|± |0.0385|
40
+ |hendrycksTest-astronomy | 1|acc |0.1776|± |0.0311|
41
+ | | |acc_norm|0.1776|± |0.0311|
42
+ |hendrycksTest-business_ethics | 1|acc |0.2100|± |0.0409|
43
+ | | |acc_norm|0.2100|± |0.0409|
44
+ |hendrycksTest-clinical_knowledge | 1|acc |0.2264|± |0.0258|
45
+ | | |acc_norm|0.2264|± |0.0258|
46
+ |hendrycksTest-college_biology | 1|acc |0.2361|± |0.0355|
47
+ | | |acc_norm|0.2361|± |0.0355|
48
+ |hendrycksTest-college_chemistry | 1|acc |0.1900|± |0.0394|
49
+ | | |acc_norm|0.1900|± |0.0394|
50
+ |hendrycksTest-college_computer_science | 1|acc |0.2100|± |0.0409|
51
+ | | |acc_norm|0.2100|± |0.0409|
52
+ |hendrycksTest-college_mathematics | 1|acc |0.1800|± |0.0386|
53
+ | | |acc_norm|0.1800|± |0.0386|
54
+ |hendrycksTest-college_medicine | 1|acc |0.2023|± |0.0306|
55
+ | | |acc_norm|0.2023|± |0.0306|
56
+ |hendrycksTest-college_physics | 1|acc |0.2157|± |0.0409|
57
+ | | |acc_norm|0.2157|± |0.0409|
58
+ |hendrycksTest-computer_security | 1|acc |0.2400|± |0.0429|
59
+ | | |acc_norm|0.2400|± |0.0429|
60
+ |hendrycksTest-conceptual_physics | 1|acc |0.2596|± |0.0287|
61
+ | | |acc_norm|0.2596|± |0.0287|
62
+ |hendrycksTest-econometrics | 1|acc |0.2544|± |0.0410|
63
+ | | |acc_norm|0.2544|± |0.0410|
64
+ |hendrycksTest-electrical_engineering | 1|acc |0.2207|± |0.0346|
65
+ | | |acc_norm|0.2207|± |0.0346|
66
+ |hendrycksTest-elementary_mathematics | 1|acc |0.2169|± |0.0212|
67
+ | | |acc_norm|0.2169|± |0.0212|
68
+ |hendrycksTest-formal_logic | 1|acc |0.1587|± |0.0327|
69
+ | | |acc_norm|0.1587|± |0.0327|
70
+ |hendrycksTest-global_facts | 1|acc |0.1900|± |0.0394|
71
+ | | |acc_norm|0.1900|± |0.0394|
72
+ |hendrycksTest-high_school_biology | 1|acc |0.3000|± |0.0261|
73
+ | | |acc_norm|0.3000|± |0.0261|
74
+ |hendrycksTest-high_school_chemistry | 1|acc |0.2808|± |0.0316|
75
+ | | |acc_norm|0.2808|± |0.0316|
76
+ |hendrycksTest-high_school_computer_science | 1|acc |0.2800|± |0.0451|
77
+ | | |acc_norm|0.2800|± |0.0451|
78
+ |hendrycksTest-high_school_european_history | 1|acc |0.2424|± |0.0335|
79
+ | | |acc_norm|0.2424|± |0.0335|
80
+ |hendrycksTest-high_school_geography | 1|acc |0.2576|± |0.0312|
81
+ | | |acc_norm|0.2576|± |0.0312|
82
+ |hendrycksTest-high_school_government_and_politics| 1|acc |0.2228|± |0.0300|
83
+ | | |acc_norm|0.2228|± |0.0300|
84
+ |hendrycksTest-high_school_macroeconomics | 1|acc |0.2231|± |0.0211|
85
+ | | |acc_norm|0.2231|± |0.0211|
86
+ |hendrycksTest-high_school_mathematics | 1|acc |0.2370|± |0.0259|
87
+ | | |acc_norm|0.2370|± |0.0259|
88
+ |hendrycksTest-high_school_microeconomics | 1|acc |0.2227|± |0.0270|
89
+ | | |acc_norm|0.2227|± |0.0270|
90
+ |hendrycksTest-high_school_physics | 1|acc |0.2053|± |0.0330|
91
+ | | |acc_norm|0.2053|± |0.0330|
92
+ |hendrycksTest-high_school_psychology | 1|acc |0.2110|± |0.0175|
93
+ | | |acc_norm|0.2110|± |0.0175|
94
+ |hendrycksTest-high_school_statistics | 1|acc |0.4120|± |0.0336|
95
+ | | |acc_norm|0.4120|± |0.0336|
96
+ |hendrycksTest-high_school_us_history | 1|acc |0.2990|± |0.0321|
97
+ | | |acc_norm|0.2990|± |0.0321|
98
+ |hendrycksTest-high_school_world_history | 1|acc |0.2658|± |0.0288|
99
+ | | |acc_norm|0.2658|± |0.0288|
100
+ |hendrycksTest-human_aging | 1|acc |0.2287|± |0.0282|
101
+ | | |acc_norm|0.2287|± |0.0282|
102
+ |hendrycksTest-human_sexuality | 1|acc |0.2595|± |0.0384|
103
+ | | |acc_norm|0.2595|± |0.0384|
104
+ |hendrycksTest-international_law | 1|acc |0.2975|± |0.0417|
105
+ | | |acc_norm|0.2975|± |0.0417|
106
+ |hendrycksTest-jurisprudence | 1|acc |0.2315|± |0.0408|
107
+ | | |acc_norm|0.2315|± |0.0408|
108
+ |hendrycksTest-logical_fallacies | 1|acc |0.2822|± |0.0354|
109
+ | | |acc_norm|0.2822|± |0.0354|
110
+ |hendrycksTest-machine_learning | 1|acc |0.2321|± |0.0401|
111
+ | | |acc_norm|0.2321|± |0.0401|
112
+ |hendrycksTest-management | 1|acc |0.1748|± |0.0376|
113
+ | | |acc_norm|0.1748|± |0.0376|
114
+ |hendrycksTest-marketing | 1|acc |0.2308|± |0.0276|
115
+ | | |acc_norm|0.2308|± |0.0276|
116
+ |hendrycksTest-medical_genetics | 1|acc |0.3000|± |0.0461|
117
+ | | |acc_norm|0.3000|± |0.0461|
118
+ |hendrycksTest-miscellaneous | 1|acc |0.2375|± |0.0152|
119
+ | | |acc_norm|0.2375|± |0.0152|
120
+ |hendrycksTest-moral_disputes | 1|acc |0.2486|± |0.0233|
121
+ | | |acc_norm|0.2486|± |0.0233|
122
+ |hendrycksTest-moral_scenarios | 1|acc |0.2425|± |0.0143|
123
+ | | |acc_norm|0.2425|± |0.0143|
124
+ |hendrycksTest-nutrition | 1|acc |0.2288|± |0.0241|
125
+ | | |acc_norm|0.2288|± |0.0241|
126
+ |hendrycksTest-philosophy | 1|acc |0.2090|± |0.0231|
127
+ | | |acc_norm|0.2090|± |0.0231|
128
+ |hendrycksTest-prehistory | 1|acc |0.2377|± |0.0237|
129
+ | | |acc_norm|0.2377|± |0.0237|
130
+ |hendrycksTest-professional_accounting | 1|acc |0.2234|± |0.0248|
131
+ | | |acc_norm|0.2234|± |0.0248|
132
+ |hendrycksTest-professional_law | 1|acc |0.2471|± |0.0110|
133
+ | | |acc_norm|0.2471|± |0.0110|
134
+ |hendrycksTest-professional_medicine | 1|acc |0.4081|± |0.0299|
135
+ | | |acc_norm|0.4081|± |0.0299|
136
+ |hendrycksTest-professional_psychology | 1|acc |0.2565|± |0.0177|
137
+ | | |acc_norm|0.2565|± |0.0177|
138
+ |hendrycksTest-public_relations | 1|acc |0.2182|± |0.0396|
139
+ | | |acc_norm|0.2182|± |0.0396|
140
+ |hendrycksTest-security_studies | 1|acc |0.2408|± |0.0274|
141
+ | | |acc_norm|0.2408|± |0.0274|
142
+ |hendrycksTest-sociology | 1|acc |0.2338|± |0.0299|
143
+ | | |acc_norm|0.2338|± |0.0299|
144
+ |hendrycksTest-us_foreign_policy | 1|acc |0.2500|± |0.0435|
145
+ | | |acc_norm|0.2500|± |0.0435|
146
+ |hendrycksTest-virology | 1|acc |0.2892|± |0.0353|
147
+ | | |acc_norm|0.2892|± |0.0353|
148
+ |hendrycksTest-world_religions | 1|acc |0.2105|± |0.0313|
149
+ | | |acc_norm|0.2105|± |0.0313|
150
+