kenhktsui commited on
Commit
554e4f0
1 Parent(s): 9704faf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -22
README.md CHANGED
@@ -8,7 +8,7 @@ inference:
8
  max_new_tokens: 64
9
  do_sample: true
10
  temperature: 0.1
11
- repetition_penalty: 10.0
12
  no_repeat_ngram_size: 4
13
  eta_cutoff: 0.0006
14
  renormalize_logits: true
@@ -17,29 +17,35 @@ widget:
17
  example_title: El Microondas
18
  - text: Kennesaw State University is a public
19
  example_title: Kennesaw State University
20
- - text: Bungie Studios is an American video game developer. They are most famous for
21
- developing the award winning Halo series of video games. They also made Destiny.
22
- The studio was founded
 
23
  example_title: Bungie
24
  - text: The Mona Lisa is a world-renowned painting created by
25
  example_title: Mona Lisa
26
- - text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
 
 
27
  example_title: Harry Potter Series
28
- - text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
 
29
  have water, but no fish. What am I?
30
 
31
- Answer:'
32
  example_title: Riddle
33
  - text: The process of photosynthesis involves the conversion of
34
  example_title: Photosynthesis
35
- - text: Jane went to the store to buy some groceries. She picked up apples, oranges,
 
36
  and a loaf of bread. When she got home, she realized she forgot
37
  example_title: Story Continuation
38
- - text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
39
- and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
 
40
  they meet if the distance between the stations is 300 miles?
41
 
42
- To determine'
43
  example_title: Math Problem
44
  - text: In the context of computer programming, an algorithm is
45
  example_title: Algorithm Definition
@@ -62,7 +68,8 @@ model-index:
62
  value: 21.93
63
  name: normalized accuracy
64
  source:
65
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
 
66
  name: Open LLM Leaderboard
67
  - task:
68
  type: text-generation
@@ -78,7 +85,8 @@ model-index:
78
  value: 27.86
79
  name: normalized accuracy
80
  source:
81
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
 
82
  name: Open LLM Leaderboard
83
  - task:
84
  type: text-generation
@@ -95,7 +103,8 @@ model-index:
95
  value: 25.34
96
  name: accuracy
97
  source:
98
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
 
99
  name: Open LLM Leaderboard
100
  - task:
101
  type: text-generation
@@ -109,9 +118,10 @@ model-index:
109
  num_few_shot: 0
110
  metrics:
111
  - type: mc2
112
- value: 46.0
113
  source:
114
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
 
115
  name: Open LLM Leaderboard
116
  - task:
117
  type: text-generation
@@ -128,7 +138,8 @@ model-index:
128
  value: 50.83
129
  name: accuracy
130
  source:
131
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
 
132
  name: Open LLM Leaderboard
133
  - task:
134
  type: text-generation
@@ -142,11 +153,19 @@ model-index:
142
  num_few_shot: 5
143
  metrics:
144
  - type: acc
145
- value: 0.0
146
  name: accuracy
147
  source:
148
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
 
149
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
150
  ---
151
 
152
 
@@ -155,7 +174,7 @@ model-index:
155
  Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
156
  Pre-trained with training 7B token **from scratch**, with application of quality filter to datasets resulting in 0.26B token.
157
  The control is [kenhktsui/nano-phi-115M-control-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-control-v0.1), where full dataset (0.6B) is used.
158
- Not much degradation in performance despite only using **42%** of the data due to the effective quality filter.
159
  In fact, upon inspection, the 6000 steps chkpt achieves similar performance as this model, signaling underlying **effective training due to high quality data**.
160
  It just took 1d to train in Colab with a A100 40GB (**<USD$ 50**).
161
  It achieves quite competitive results in evaluation given its training token, and training data size.
@@ -569,5 +588,4 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
569
  |MMLU (5-Shot) |25.34|
570
  |TruthfulQA (0-shot) |46.00|
571
  |Winogrande (5-shot) |50.83|
572
- |GSM8k (5-shot) | 0.00|
573
-
 
8
  max_new_tokens: 64
9
  do_sample: true
10
  temperature: 0.1
11
+ repetition_penalty: 10
12
  no_repeat_ngram_size: 4
13
  eta_cutoff: 0.0006
14
  renormalize_logits: true
 
17
  example_title: El Microondas
18
  - text: Kennesaw State University is a public
19
  example_title: Kennesaw State University
20
+ - text: >-
21
+ Bungie Studios is an American video game developer. They are most famous for
22
+ developing the award winning Halo series of video games. They also made
23
+ Destiny. The studio was founded
24
  example_title: Bungie
25
  - text: The Mona Lisa is a world-renowned painting created by
26
  example_title: Mona Lisa
27
+ - text: >-
28
+ The Harry Potter series, written by J.K. Rowling, begins with the book
29
+ titled
30
  example_title: Harry Potter Series
31
+ - text: >-
32
+ Question: I have cities, but no houses. I have mountains, but no trees. I
33
  have water, but no fish. What am I?
34
 
35
+ Answer:
36
  example_title: Riddle
37
  - text: The process of photosynthesis involves the conversion of
38
  example_title: Photosynthesis
39
+ - text: >-
40
+ Jane went to the store to buy some groceries. She picked up apples, oranges,
41
  and a loaf of bread. When she got home, she realized she forgot
42
  example_title: Story Continuation
43
+ - text: >-
44
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
45
+ another train leaves Station B at 10:00 AM and travels at 80 mph, when will
46
  they meet if the distance between the stations is 300 miles?
47
 
48
+ To determine
49
  example_title: Math Problem
50
  - text: In the context of computer programming, an algorithm is
51
  example_title: Algorithm Definition
 
68
  value: 21.93
69
  name: normalized accuracy
70
  source:
71
+ url: >-
72
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
73
  name: Open LLM Leaderboard
74
  - task:
75
  type: text-generation
 
85
  value: 27.86
86
  name: normalized accuracy
87
  source:
88
+ url: >-
89
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
90
  name: Open LLM Leaderboard
91
  - task:
92
  type: text-generation
 
103
  value: 25.34
104
  name: accuracy
105
  source:
106
+ url: >-
107
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
108
  name: Open LLM Leaderboard
109
  - task:
110
  type: text-generation
 
118
  num_few_shot: 0
119
  metrics:
120
  - type: mc2
121
+ value: 46
122
  source:
123
+ url: >-
124
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
125
  name: Open LLM Leaderboard
126
  - task:
127
  type: text-generation
 
138
  value: 50.83
139
  name: accuracy
140
  source:
141
+ url: >-
142
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
143
  name: Open LLM Leaderboard
144
  - task:
145
  type: text-generation
 
153
  num_few_shot: 5
154
  metrics:
155
  - type: acc
156
+ value: 0
157
  name: accuracy
158
  source:
159
+ url: >-
160
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
161
  name: Open LLM Leaderboard
162
+ datasets:
163
+ - kenhktsui/minipile_quality_score_v1
164
+ - kenhktsui/simple_wikipedia_LM_quality_score_v1
165
+ - kenhktsui/refinedweb-3m_quality_score_v1
166
+ - kenhktsui/TM-DATA_quality_score_v1
167
+ - kenhktsui/openwebtext_quality_score_v1
168
+
169
  ---
170
 
171
 
 
174
  Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
175
  Pre-trained with training 7B token **from scratch**, with application of quality filter to datasets resulting in 0.26B token.
176
  The control is [kenhktsui/nano-phi-115M-control-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-control-v0.1), where full dataset (0.6B) is used.
177
+ Not much degradation in performance despite only using **42%** of the data due to the effective quality filter ("quality_score_v1" > 0.5).
178
  In fact, upon inspection, the 6000 steps chkpt achieves similar performance as this model, signaling underlying **effective training due to high quality data**.
179
  It just took 1d to train in Colab with a A100 40GB (**<USD$ 50**).
180
  It achieves quite competitive results in evaluation given its training token, and training data size.
 
588
  |MMLU (5-Shot) |25.34|
589
  |TruthfulQA (0-shot) |46.00|
590
  |Winogrande (5-shot) |50.83|
591
+ |GSM8k (5-shot) | 0.00|