Text Generation
Transformers
Safetensors
English
phi
custom_code
Inference Endpoints
text-generation-inference
kenhktsui commited on
Commit
e7cd280
1 Parent(s): c318401

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +243 -8
README.md CHANGED
@@ -1,13 +1,250 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ## Model Details
13
 
@@ -196,6 +433,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
200
-
201
-
 
1
  ---
2
  library_name: transformers
3
+ language:
4
+ - en
5
+ inference:
6
+ parameters:
7
+ max_new_tokens: 64
8
+ do_sample: true
9
+ temperature: 0.8
10
+ repetition_penalty: 1.15
11
+ no_repeat_ngram_size: 4
12
+ eta_cutoff: 0.0006
13
+ renormalize_logits: true
14
+ widget:
15
+ - text: My name is El Microondas the Wise, and
16
+ example_title: El Microondas
17
+ - text: Kennesaw State University is a public
18
+ example_title: Kennesaw State University
19
+ - text: >-
20
+ Bungie Studios is an American video game developer. They are most famous for
21
+ developing the award winning Halo series of video games. They also made
22
+ Destiny. The studio was founded
23
+ example_title: Bungie
24
+ - text: The Mona Lisa is a world-renowned painting created by
25
+ example_title: Mona Lisa
26
+ - text: >-
27
+ The Harry Potter series, written by J.K. Rowling, begins with the book
28
+ titled
29
+ example_title: Harry Potter Series
30
+ - text: >-
31
+ Question: I have cities, but no houses. I have mountains, but no trees. I
32
+ have water, but no fish. What am I?
33
+
34
+ Answer:
35
+ example_title: Riddle
36
+ - text: The process of photosynthesis involves the conversion of
37
+ example_title: Photosynthesis
38
+ - text: >-
39
+ Jane went to the store to buy some groceries. She picked up apples, oranges,
40
+ and a loaf of bread. When she got home, she realized she forgot
41
+ example_title: Story Continuation
42
+ - text: >-
43
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
44
+ another train leaves Station B at 10:00 AM and travels at 80 mph, when will
45
+ they meet if the distance between the stations is 300 miles?
46
+
47
+ To determine
48
+ example_title: Math Problem
49
+ - text: In the context of computer programming, an algorithm is
50
+ example_title: Algorithm Definition
51
+ pipeline_tag: text-generation
52
+ datasets:
53
+ - JeanKaddour/minipile
54
+ - pszemraj/simple_wikipedia_LM
55
+ - mattymchen/refinedweb-3m
56
+ - Locutusque/TM-DATA
57
+ - Skylion007/openwebtext
58
  ---
59
 
 
 
 
 
60
 
61
+ # Model Card for nano-phi-115M-control-v0.1
62
+
63
+ Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
64
+ Pre-trained with training 7B token from scratch, with a dataset of 0.6B token.
65
+ This model acts as a control of [kenhktsui/nano-phi-115M-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-v0.1) which applies quality filter to dataset resulting in small dataset.
66
+ It just took 2d 4h to train in Colab with a A100 40GB (~USD$ 100).
67
+ It achieves quite competitive results in evaluation given its training token, and training data size.
68
+ No alignment has been done yet.
69
+
70
+ ## Some metrics
71
+ - model
72
+ - hidden_size: 768
73
+ - num_key_value_heads: 8 (grouped query attention)
74
+ - num_attention_heads: 24
75
+ - num_hidden_layers: 6
76
+ - context length: 1024
77
+ - total params: 115M
78
+ - training:
79
+ - global steps: 14,000
80
+
81
+
82
+
83
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
84
+
85
+ | Metric | Value |
86
+ |-----------------------|---------------------------|
87
+ | Avg. | 28.75 |
88
+ | ARC (25-shot) | 21.67 |
89
+ | HellaSwag (10-shot) | 26.89 |
90
+ | MMLU (5-shot) | 24.76 |
91
+ | TruthfulQA (0-shot) | 47.69 |
92
+ | Winogrande (5-shot) | 51.46 |
93
+ | GSM8K (5-shot) | 0.0 |
94
+
95
+ Details:
96
+
97
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
98
+ | Task |Version| Metric |Value | |Stderr|
99
+ |--------|------:|--------|-----:|---|-----:|
100
+ |arc_easy| 0|acc |0.3973|± |0.0100|
101
+ | | |acc_norm|0.3531|± |0.0098|
102
+
103
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 16
104
+ | Task |Version| Metric |Value | |Stderr|
105
+ |-------------|------:|--------|-----:|---|-----:|
106
+ |arc_challenge| 0|acc |0.1843|± |0.0113|
107
+ | | |acc_norm|0.2167|± |0.0120|
108
+
109
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 16
110
+ | Task |Version| Metric |Value | |Stderr|
111
+ |---------|------:|--------|-----:|---|-----:|
112
+ |hellaswag| 0|acc |0.2682|± |0.0044|
113
+ | | |acc_norm|0.2689|± |0.0044|
114
+
115
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
116
+ | Task |Version|Metric|Value | |Stderr|
117
+ |-------------|------:|------|-----:|---|-----:|
118
+ |truthfulqa_mc| 1|mc1 |0.2619|± |0.0154|
119
+ | | |mc2 |0.4769|± |0.0156|
120
+
121
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
122
+ | Task |Version| Metric |Value | |Stderr|
123
+ |-------------------------------------------------|------:|--------|-----:|---|-----:|
124
+ |hendrycksTest-abstract_algebra | 1|acc |0.2200|± |0.0416|
125
+ | | |acc_norm|0.2200|± |0.0416|
126
+ |hendrycksTest-anatomy | 1|acc |0.3333|± |0.0407|
127
+ | | |acc_norm|0.3333|± |0.0407|
128
+ |hendrycksTest-astronomy | 1|acc |0.2895|± |0.0369|
129
+ | | |acc_norm|0.2895|± |0.0369|
130
+ |hendrycksTest-business_ethics | 1|acc |0.2000|± |0.0402|
131
+ | | |acc_norm|0.2000|± |0.0402|
132
+ |hendrycksTest-clinical_knowledge | 1|acc |0.2189|± |0.0254|
133
+ | | |acc_norm|0.2189|± |0.0254|
134
+ |hendrycksTest-college_biology | 1|acc |0.2222|± |0.0348|
135
+ | | |acc_norm|0.2222|± |0.0348|
136
+ |hendrycksTest-college_chemistry | 1|acc |0.1700|± |0.0378|
137
+ | | |acc_norm|0.1700|± |0.0378|
138
+ |hendrycksTest-college_computer_science | 1|acc |0.3000|± |0.0461|
139
+ | | |acc_norm|0.3000|± |0.0461|
140
+ |hendrycksTest-college_mathematics | 1|acc |0.2500|± |0.0435|
141
+ | | |acc_norm|0.2500|± |0.0435|
142
+ |hendrycksTest-college_medicine | 1|acc |0.1965|± |0.0303|
143
+ | | |acc_norm|0.1965|± |0.0303|
144
+ |hendrycksTest-college_physics | 1|acc |0.2353|± |0.0422|
145
+ | | |acc_norm|0.2353|± |0.0422|
146
+ |hendrycksTest-computer_security | 1|acc |0.2000|± |0.0402|
147
+ | | |acc_norm|0.2000|± |0.0402|
148
+ |hendrycksTest-conceptual_physics | 1|acc |0.2043|± |0.0264|
149
+ | | |acc_norm|0.2043|± |0.0264|
150
+ |hendrycksTest-econometrics | 1|acc |0.2456|± |0.0405|
151
+ | | |acc_norm|0.2456|± |0.0405|
152
+ |hendrycksTest-electrical_engineering | 1|acc |0.2621|± |0.0366|
153
+ | | |acc_norm|0.2621|± |0.0366|
154
+ |hendrycksTest-elementary_mathematics | 1|acc |0.2566|± |0.0225|
155
+ | | |acc_norm|0.2566|± |0.0225|
156
+ |hendrycksTest-formal_logic | 1|acc |0.1587|± |0.0327|
157
+ | | |acc_norm|0.1587|± |0.0327|
158
+ |hendrycksTest-global_facts | 1|acc |0.1600|± |0.0368|
159
+ | | |acc_norm|0.1600|± |0.0368|
160
+ |hendrycksTest-high_school_biology | 1|acc |0.3226|± |0.0266|
161
+ | | |acc_norm|0.3226|± |0.0266|
162
+ |hendrycksTest-high_school_chemistry | 1|acc |0.2956|± |0.0321|
163
+ | | |acc_norm|0.2956|± |0.0321|
164
+ |hendrycksTest-high_school_computer_science | 1|acc |0.2800|± |0.0451|
165
+ | | |acc_norm|0.2800|± |0.0451|
166
+ |hendrycksTest-high_school_european_history | 1|acc |0.2606|± |0.0343|
167
+ | | |acc_norm|0.2606|± |0.0343|
168
+ |hendrycksTest-high_school_geography | 1|acc |0.2626|± |0.0314|
169
+ | | |acc_norm|0.2626|± |0.0314|
170
+ |hendrycksTest-high_school_government_and_politics| 1|acc |0.2176|± |0.0298|
171
+ | | |acc_norm|0.2176|± |0.0298|
172
+ |hendrycksTest-high_school_macroeconomics | 1|acc |0.2128|± |0.0208|
173
+ | | |acc_norm|0.2128|± |0.0208|
174
+ |hendrycksTest-high_school_mathematics | 1|acc |0.2630|± |0.0268|
175
+ | | |acc_norm|0.2630|± |0.0268|
176
+ |hendrycksTest-high_school_microeconomics | 1|acc |0.2227|± |0.0270|
177
+ | | |acc_norm|0.2227|± |0.0270|
178
+ |hendrycksTest-high_school_physics | 1|acc |0.3046|± |0.0376|
179
+ | | |acc_norm|0.3046|± |0.0376|
180
+ |hendrycksTest-high_school_psychology | 1|acc |0.2055|± |0.0173|
181
+ | | |acc_norm|0.2055|± |0.0173|
182
+ |hendrycksTest-high_school_statistics | 1|acc |0.4815|± |0.0341|
183
+ | | |acc_norm|0.4815|± |0.0341|
184
+ |hendrycksTest-high_school_us_history | 1|acc |0.2059|± |0.0284|
185
+ | | |acc_norm|0.2059|± |0.0284|
186
+ |hendrycksTest-high_school_world_history | 1|acc |0.2574|± |0.0285|
187
+ | | |acc_norm|0.2574|± |0.0285|
188
+ |hendrycksTest-human_aging | 1|acc |0.2063|± |0.0272|
189
+ | | |acc_norm|0.2063|± |0.0272|
190
+ |hendrycksTest-human_sexuality | 1|acc |0.2443|± |0.0377|
191
+ | | |acc_norm|0.2443|± |0.0377|
192
+ |hendrycksTest-international_law | 1|acc |0.2727|± |0.0407|
193
+ | | |acc_norm|0.2727|± |0.0407|
194
+ |hendrycksTest-jurisprudence | 1|acc |0.2130|± |0.0396|
195
+ | | |acc_norm|0.2130|± |0.0396|
196
+ |hendrycksTest-logical_fallacies | 1|acc |0.2515|± |0.0341|
197
+ | | |acc_norm|0.2515|± |0.0341|
198
+ |hendrycksTest-machine_learning | 1|acc |0.2321|± |0.0401|
199
+ | | |acc_norm|0.2321|± |0.0401|
200
+ |hendrycksTest-management | 1|acc |0.2039|± |0.0399|
201
+ | | |acc_norm|0.2039|± |0.0399|
202
+ |hendrycksTest-marketing | 1|acc |0.1966|± |0.0260|
203
+ | | |acc_norm|0.1966|± |0.0260|
204
+ |hendrycksTest-medical_genetics | 1|acc |0.3000|± |0.0461|
205
+ | | |acc_norm|0.3000|± |0.0461|
206
+ |hendrycksTest-miscellaneous | 1|acc |0.2631|± |0.0157|
207
+ | | |acc_norm|0.2631|± |0.0157|
208
+ |hendrycksTest-moral_disputes | 1|acc |0.2457|± |0.0232|
209
+ | | |acc_norm|0.2457|± |0.0232|
210
+ |hendrycksTest-moral_scenarios | 1|acc |0.2682|± |0.0148|
211
+ | | |acc_norm|0.2682|± |0.0148|
212
+ |hendrycksTest-nutrition | 1|acc |0.2451|± |0.0246|
213
+ | | |acc_norm|0.2451|± |0.0246|
214
+ |hendrycksTest-philosophy | 1|acc |0.2605|± |0.0249|
215
+ | | |acc_norm|0.2605|± |0.0249|
216
+ |hendrycksTest-prehistory | 1|acc |0.2932|± |0.0253|
217
+ | | |acc_norm|0.2932|± |0.0253|
218
+ |hendrycksTest-professional_accounting | 1|acc |0.2340|± |0.0253|
219
+ | | |acc_norm|0.2340|± |0.0253|
220
+ |hendrycksTest-professional_law | 1|acc |0.2432|± |0.0110|
221
+ | | |acc_norm|0.2432|± |0.0110|
222
+ |hendrycksTest-professional_medicine | 1|acc |0.4301|± |0.0301|
223
+ | | |acc_norm|0.4301|± |0.0301|
224
+ |hendrycksTest-professional_psychology | 1|acc |0.2369|± |0.0172|
225
+ | | |acc_norm|0.2369|± |0.0172|
226
+ |hendrycksTest-public_relations | 1|acc |0.2091|± |0.0390|
227
+ | | |acc_norm|0.2091|± |0.0390|
228
+ |hendrycksTest-security_studies | 1|acc |0.2408|± |0.0274|
229
+ | | |acc_norm|0.2408|± |0.0274|
230
+ |hendrycksTest-sociology | 1|acc |0.2388|± |0.0301|
231
+ | | |acc_norm|0.2388|± |0.0301|
232
+ |hendrycksTest-us_foreign_policy | 1|acc |0.2600|± |0.0441|
233
+ | | |acc_norm|0.2600|± |0.0441|
234
+ |hendrycksTest-virology | 1|acc |0.2048|± |0.0314|
235
+ | | |acc_norm|0.2048|± |0.0314|
236
+ |hendrycksTest-world_religions | 1|acc |0.2047|± |0.0309|
237
+ | | |acc_norm|0.2047|± |0.0309|
238
+
239
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
240
+ | Task |Version|Metric|Value | |Stderr|
241
+ |----------|------:|------|-----:|---|-----:|
242
+ |winogrande| 0|acc |0.5146|± | 0.014|
243
+
244
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
245
+ |Task |Version|Metric|Value| |Stderr|
246
+ |-----|------:|------|----:|---|-----:|
247
+ |gsm8k| 0|acc | 0|± | 0|
248
 
249
  ## Model Details
250
 
 
433
 
434
  ## Model Card Contact
435
 
436
+ [More Information Needed]