kenhktsui commited on
Commit
e5c1dbe
1 Parent(s): c7b0a22

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +374 -0
README.md ADDED
@@ -0,0 +1,374 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ inference:
7
+ parameters:
8
+ max_new_tokens: 64
9
+ do_sample: true
10
+ temperature: 0.1
11
+ repetition_penalty: 10
12
+ no_repeat_ngram_size: 4
13
+ eta_cutoff: 0.0006
14
+ renormalize_logits: true
15
+ widget:
16
+ - text: My name is El Microondas the Wise, and
17
+ example_title: El Microondas
18
+ - text: Kennesaw State University is a public
19
+ example_title: Kennesaw State University
20
+ - text: >-
21
+ Bungie Studios is an American video game developer. They are most famous for
22
+ developing the award winning Halo series of video games. They also made
23
+ Destiny. The studio was founded
24
+ example_title: Bungie
25
+ - text: The Mona Lisa is a world-renowned painting created by
26
+ example_title: Mona Lisa
27
+ - text: >-
28
+ The Harry Potter series, written by J.K. Rowling, begins with the book
29
+ titled
30
+ example_title: Harry Potter Series
31
+ - text: >-
32
+ Question: I have cities, but no houses. I have mountains, but no trees. I
33
+ have water, but no fish. What am I?
34
+
35
+ Answer:
36
+ example_title: Riddle
37
+ - text: The process of photosynthesis involves the conversion of
38
+ example_title: Photosynthesis
39
+ - text: >-
40
+ Jane went to the store to buy some groceries. She picked up apples, oranges,
41
+ and a loaf of bread. When she got home, she realized she forgot
42
+ example_title: Story Continuation
43
+ - text: >-
44
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
45
+ another train leaves Station B at 10:00 AM and travels at 80 mph, when will
46
+ they meet if the distance between the stations is 300 miles?
47
+
48
+ To determine
49
+ example_title: Math Problem
50
+ - text: In the context of computer programming, an algorithm is
51
+ example_title: Algorithm Definition
52
+ pipeline_tag: text-generation
53
+ model-index:
54
+ - name: nano-phi-115M-v0.1
55
+ results:
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: AI2 Reasoning Challenge (25-Shot)
61
+ type: ai2_arc
62
+ config: ARC-Challenge
63
+ split: test
64
+ args:
65
+ num_few_shot: 25
66
+ metrics:
67
+ - type: acc_norm
68
+ value: 24.15
69
+ name: normalized accuracy
70
+ source:
71
+ url: >-
72
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: HellaSwag (10-Shot)
79
+ type: hellaswag
80
+ split: validation
81
+ args:
82
+ num_few_shot: 10
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 29.99
86
+ name: normalized accuracy
87
+ source:
88
+ url: >-
89
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
90
+ name: Open LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: MMLU (5-Shot)
96
+ type: cais/mmlu
97
+ config: all
98
+ split: test
99
+ args:
100
+ num_few_shot: 5
101
+ metrics:
102
+ - type: acc
103
+ value: 25.46
104
+ name: accuracy
105
+ source:
106
+ url: >-
107
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: TruthfulQA (0-shot)
114
+ type: truthful_qa
115
+ config: multiple_choice
116
+ split: validation
117
+ args:
118
+ num_few_shot: 0
119
+ metrics:
120
+ - type: mc2
121
+ value: 44.3
122
+ source:
123
+ url: >-
124
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
125
+ name: Open LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: Winogrande (5-shot)
131
+ type: winogrande
132
+ config: winogrande_xl
133
+ split: validation
134
+ args:
135
+ num_few_shot: 5
136
+ metrics:
137
+ - type: acc
138
+ value: 51.45
139
+ name: accuracy
140
+ source:
141
+ url: >-
142
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
143
+ name: Open LLM Leaderboard
144
+ - task:
145
+ type: text-generation
146
+ name: Text Generation
147
+ dataset:
148
+ name: GSM8k (5-shot)
149
+ type: gsm8k
150
+ config: main
151
+ split: test
152
+ args:
153
+ num_few_shot: 5
154
+ metrics:
155
+ - type: acc
156
+ value: 0
157
+ name: accuracy
158
+ source:
159
+ url: >-
160
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
161
+ name: Open LLM Leaderboard
162
+ datasets:
163
+ - kenhktsui/minipile_quality_score_v1
164
+ - kenhktsui/simple_wikipedia_LM_quality_score_v1
165
+ - kenhktsui/refinedweb-3m_quality_score_v1
166
+ - kenhktsui/TM-DATA_quality_score_v1
167
+ - kenhktsui/openwebtext_quality_score_v1
168
+ - HuggingFaceTB/cosmopedia
169
+ ---
170
+
171
+
172
+ # Model Card for nano-phi-192M-v0.1
173
+ This is a continual effort from [kenhktsui/nano-phi-115M-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-v0.1).
174
+ The model is not aligned.
175
+
176
+ Major differences:
177
+ - bigger tokenizer's vocab size
178
+ - addition of [HuggingFaceTB/cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) as training dataset
179
+ - training token: 19B vs 7B
180
+
181
+
182
+ ## How to use
183
+ To use the model, you will need transformer version >= 4.37.2
184
+ ```
185
+ pip install transformers>=4.37.2
186
+ ```
187
+
188
+ ```
189
+ # Use a pipeline as a high-level helper
190
+ from transformers import pipeline
191
+
192
+ pipe = pipeline("text-generation", model="kenhktsui/nano-phi-192M-v0.1")
193
+ pipe("I am a machine learning researcher. I work on", max_new_tokens=50, repetition_penalty=10.0)
194
+ ```
195
+
196
+ ## Some metrics
197
+ - model
198
+ - hidden_size: 768
199
+ - num_key_value_heads: 8 (grouped query attention)
200
+ - num_attention_heads: 24
201
+ - num_hidden_layers: 6
202
+ - context length: 1024
203
+ - total params: 192M
204
+ - training:
205
+ - global steps: 36,000
206
+
207
+
208
+
209
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
210
+
211
+
212
+ | Metric |kenhktsui/nano-phi-191M-v0.1 |[kenhktsui/nano-phi-115M-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-v0.1)|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2) (Reproduced)|
213
+ |-----------------------|---------------------------|---------------------------|---------------------------|
214
+ | Avg. |29.24 | 28.68 |61.53 |
215
+ | ARC (25-shot) |24.15 | 21.93 |61.52 |
216
+ | HellaSwag (10-shot) | 29.99 | 27.87 |75.13 |
217
+ | MMLU (5-shot) |25.46 | 25.30 |58.23 |
218
+ | TruthfulQA (0-shot) |44.30 | 46.01 |44.46 |
219
+ | Winogrande (5-shot) |51.54 | 50.99 |74.51 |
220
+ | GSM8K (5-shot) |0.0 | 0.0 |55.34 |
221
+
222
+ Details:
223
+
224
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8
225
+ | Task |Version| Metric |Value | |Stderr|
226
+ |--------|------:|--------|-----:|---|-----:|
227
+ |arc_easy| 0|acc |0.4596|± |0.0102|
228
+ | | |acc_norm|0.4070|± |0.0101|
229
+
230
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 8
231
+ | Task |Version| Metric |Value | |Stderr|
232
+ |-------------|------:|--------|-----:|---|-----:|
233
+ |arc_challenge| 0|acc |0.1911|± |0.0115|
234
+ | | |acc_norm|0.2415|± |0.0125|
235
+
236
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 8
237
+ | Task |Version| Metric |Value | |Stderr|
238
+ |---------|------:|--------|-----:|---|-----:|
239
+ |hellaswag| 0|acc |0.2833|± |0.0045|
240
+ | | |acc_norm|0.2999|± |0.0046|
241
+
242
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8
243
+ | Task |Version|Metric|Value | |Stderr|
244
+ |-------------|------:|------|-----:|---|-----:|
245
+ |truthfulqa_mc| 1|mc1 |0.2583|± |0.0153|
246
+ | | |mc2 |0.4430|± |0.0152|
247
+
248
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
249
+ | Task |Version| Metric |Value | |Stderr|
250
+ |-------------------------------------------------|------:|--------|-----:|---|-----:|
251
+ |hendrycksTest-abstract_algebra | 1|acc |0.2200|± |0.0416|
252
+ | | |acc_norm|0.2200|± |0.0416|
253
+ |hendrycksTest-anatomy | 1|acc |0.2593|± |0.0379|
254
+ | | |acc_norm|0.2593|± |0.0379|
255
+ |hendrycksTest-astronomy | 1|acc |0.1711|± |0.0306|
256
+ | | |acc_norm|0.1711|± |0.0306|
257
+ |hendrycksTest-business_ethics | 1|acc |0.2400|± |0.0429|
258
+ | | |acc_norm|0.2400|± |0.0429|
259
+ |hendrycksTest-clinical_knowledge | 1|acc |0.2566|± |0.0269|
260
+ | | |acc_norm|0.2566|± |0.0269|
261
+ |hendrycksTest-college_biology | 1|acc |0.2639|± |0.0369|
262
+ | | |acc_norm|0.2639|± |0.0369|
263
+ |hendrycksTest-college_chemistry | 1|acc |0.1800|± |0.0386|
264
+ | | |acc_norm|0.1800|± |0.0386|
265
+ |hendrycksTest-college_computer_science | 1|acc |0.3300|± |0.0473|
266
+ | | |acc_norm|0.3300|± |0.0473|
267
+ |hendrycksTest-college_mathematics | 1|acc |0.3000|± |0.0461|
268
+ | | |acc_norm|0.3000|± |0.0461|
269
+ |hendrycksTest-college_medicine | 1|acc |0.2023|± |0.0306|
270
+ | | |acc_norm|0.2023|± |0.0306|
271
+ |hendrycksTest-college_physics | 1|acc |0.2843|± |0.0449|
272
+ | | |acc_norm|0.2843|± |0.0449|
273
+ |hendrycksTest-computer_security | 1|acc |0.2200|± |0.0416|
274
+ | | |acc_norm|0.2200|± |0.0416|
275
+ |hendrycksTest-conceptual_physics | 1|acc |0.2511|± |0.0283|
276
+ | | |acc_norm|0.2511|± |0.0283|
277
+ |hendrycksTest-econometrics | 1|acc |0.2807|± |0.0423|
278
+ | | |acc_norm|0.2807|± |0.0423|
279
+ |hendrycksTest-electrical_engineering | 1|acc |0.2897|± |0.0378|
280
+ | | |acc_norm|0.2897|± |0.0378|
281
+ |hendrycksTest-elementary_mathematics | 1|acc |0.2804|± |0.0231|
282
+ | | |acc_norm|0.2804|± |0.0231|
283
+ |hendrycksTest-formal_logic | 1|acc |0.2143|± |0.0367|
284
+ | | |acc_norm|0.2143|± |0.0367|
285
+ |hendrycksTest-global_facts | 1|acc |0.1700|± |0.0378|
286
+ | | |acc_norm|0.1700|± |0.0378|
287
+ |hendrycksTest-high_school_biology | 1|acc |0.3226|± |0.0266|
288
+ | | |acc_norm|0.3226|± |0.0266|
289
+ |hendrycksTest-high_school_chemistry | 1|acc |0.2759|± |0.0314|
290
+ | | |acc_norm|0.2759|± |0.0314|
291
+ |hendrycksTest-high_school_computer_science | 1|acc |0.2700|± |0.0446|
292
+ | | |acc_norm|0.2700|± |0.0446|
293
+ |hendrycksTest-high_school_european_history | 1|acc |0.2606|± |0.0343|
294
+ | | |acc_norm|0.2606|± |0.0343|
295
+ |hendrycksTest-high_school_geography | 1|acc |0.3081|± |0.0329|
296
+ | | |acc_norm|0.3081|± |0.0329|
297
+ |hendrycksTest-high_school_government_and_politics| 1|acc |0.3627|± |0.0347|
298
+ | | |acc_norm|0.3627|± |0.0347|
299
+ |hendrycksTest-high_school_macroeconomics | 1|acc |0.2641|± |0.0224|
300
+ | | |acc_norm|0.2641|± |0.0224|
301
+ |hendrycksTest-high_school_mathematics | 1|acc |0.2630|± |0.0268|
302
+ | | |acc_norm|0.2630|± |0.0268|
303
+ |hendrycksTest-high_school_microeconomics | 1|acc |0.3403|± |0.0308|
304
+ | | |acc_norm|0.3403|± |0.0308|
305
+ |hendrycksTest-high_school_physics | 1|acc |0.3113|± |0.0378|
306
+ | | |acc_norm|0.3113|± |0.0378|
307
+ |hendrycksTest-high_school_psychology | 1|acc |0.2716|± |0.0191|
308
+ | | |acc_norm|0.2716|± |0.0191|
309
+ |hendrycksTest-high_school_statistics | 1|acc |0.4491|± |0.0339|
310
+ | | |acc_norm|0.4491|± |0.0339|
311
+ |hendrycksTest-high_school_us_history | 1|acc |0.2402|± |0.0300|
312
+ | | |acc_norm|0.2402|± |0.0300|
313
+ |hendrycksTest-high_school_world_history | 1|acc |0.2363|± |0.0277|
314
+ | | |acc_norm|0.2363|± |0.0277|
315
+ |hendrycksTest-human_aging | 1|acc |0.2197|± |0.0278|
316
+ | | |acc_norm|0.2197|± |0.0278|
317
+ |hendrycksTest-human_sexuality | 1|acc |0.2824|± |0.0395|
318
+ | | |acc_norm|0.2824|± |0.0395|
319
+ |hendrycksTest-international_law | 1|acc |0.2479|± |0.0394|
320
+ | | |acc_norm|0.2479|± |0.0394|
321
+ |hendrycksTest-jurisprudence | 1|acc |0.2037|± |0.0389|
322
+ | | |acc_norm|0.2037|± |0.0389|
323
+ |hendrycksTest-logical_fallacies | 1|acc |0.2393|± |0.0335|
324
+ | | |acc_norm|0.2393|± |0.0335|
325
+ |hendrycksTest-machine_learning | 1|acc |0.1875|± |0.0370|
326
+ | | |acc_norm|0.1875|± |0.0370|
327
+ |hendrycksTest-management | 1|acc |0.2039|± |0.0399|
328
+ | | |acc_norm|0.2039|± |0.0399|
329
+ |hendrycksTest-marketing | 1|acc |0.1795|± |0.0251|
330
+ | | |acc_norm|0.1795|± |0.0251|
331
+ |hendrycksTest-medical_genetics | 1|acc |0.3000|± |0.0461|
332
+ | | |acc_norm|0.3000|± |0.0461|
333
+ |hendrycksTest-miscellaneous | 1|acc |0.2644|± |0.0158|
334
+ | | |acc_norm|0.2644|± |0.0158|
335
+ |hendrycksTest-moral_disputes | 1|acc |0.2225|± |0.0224|
336
+ | | |acc_norm|0.2225|± |0.0224|
337
+ |hendrycksTest-moral_scenarios | 1|acc |0.2726|± |0.0149|
338
+ | | |acc_norm|0.2726|± |0.0149|
339
+ |hendrycksTest-nutrition | 1|acc |0.2353|± |0.0243|
340
+ | | |acc_norm|0.2353|± |0.0243|
341
+ |hendrycksTest-philosophy | 1|acc |0.2283|± |0.0238|
342
+ | | |acc_norm|0.2283|± |0.0238|
343
+ |hendrycksTest-prehistory | 1|acc |0.2099|± |0.0227|
344
+ | | |acc_norm|0.2099|± |0.0227|
345
+ |hendrycksTest-professional_accounting | 1|acc |0.2411|± |0.0255|
346
+ | | |acc_norm|0.2411|± |0.0255|
347
+ |hendrycksTest-professional_law | 1|acc |0.2458|± |0.0110|
348
+ | | |acc_norm|0.2458|± |0.0110|
349
+ |hendrycksTest-professional_medicine | 1|acc |0.3897|± |0.0296|
350
+ | | |acc_norm|0.3897|± |0.0296|
351
+ |hendrycksTest-professional_psychology | 1|acc |0.2141|± |0.0166|
352
+ | | |acc_norm|0.2141|± |0.0166|
353
+ |hendrycksTest-public_relations | 1|acc |0.1818|± |0.0369|
354
+ | | |acc_norm|0.1818|± |0.0369|
355
+ |hendrycksTest-security_studies | 1|acc |0.2490|± |0.0277|
356
+ | | |acc_norm|0.2490|± |0.0277|
357
+ |hendrycksTest-sociology | 1|acc |0.2537|± |0.0308|
358
+ | | |acc_norm|0.2537|± |0.0308|
359
+ |hendrycksTest-us_foreign_policy | 1|acc |0.2900|± |0.0456|
360
+ | | |acc_norm|0.2900|± |0.0456|
361
+ |hendrycksTest-virology | 1|acc |0.1807|± |0.0300|
362
+ | | |acc_norm|0.1807|± |0.0300|
363
+ |hendrycksTest-world_religions | 1|acc |0.1813|± |0.0295|
364
+ | | |acc_norm|0.1813|± |0.0295|
365
+
366
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
367
+ | Task |Version|Metric|Value | |Stderr|
368
+ |----------|------:|------|-----:|---|-----:|
369
+ |winogrande| 0|acc |0.5154|± | 0.014|
370
+
371
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
372
+ |Task |Version|Metric|Value| |Stderr|
373
+ |-----|------:|------|----:|---|-----:|
374
+ |gsm8k| 0|acc | 0|± | 0|