agentlans commited on
Commit
e482760
1 Parent(s): 7ede264

Upload 11 files

Browse files
README.md CHANGED
@@ -1,3 +1,108 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pythia-14M Fine-Tuned for High-Quality English Sentence Generation
2
+
3
+ This model is a fine-tuned version of the Pythia-14M language model, optimized for generating high-quality English sentences. It builds upon the base model [agentlans/pythia-14m-finewebedu-sentences](https://huggingface.co/agentlans/pythia-14m-finewebedu-sentences) and has been further trained on a curated dataset of well-formed English sentences [agentlans/high-quality-english-sentences](https://huggingface.co/datasets/agentlans/high-quality-english-sentences).
4
+
5
+ ## Model Description
6
+
7
+ The model is based on the Pythia-14M architecture, which is a relatively compact language model. It has been fine-tuned specifically for generating (mostly) grammatically correct and coherent English sentences across a variety of topics and styles.
8
+
9
+ ## Intended Uses & Limitations
10
+
11
+ This model is designed for:
12
+ - Generating high-quality English sentences
13
+ - Completing partial sentences
14
+ - Assisting with writing tasks that require well-formed English
15
+
16
+ Limitations:
17
+ - Not suitable for tasks requiring deep domain knowledge
18
+ - May struggle with very long-form text generation
19
+ - Fails on non-English text
20
+ - It's tiny so don't expect too much
21
+
22
+ ## Training Data
23
+
24
+ The model was fine-tuned on a combination of datasets:
25
+ - Web-scraped educational content (finewebedu)
26
+ - High-quality web text (fineweb)
27
+ - Filtered Common Crawl data (C4)
28
+
29
+ For the composition and preprocessing of the training data, see [agentlans/high-quality-english-sentences](https://huggingface.co/datasets/agentlans/high-quality-english-sentences).
30
+
31
+ ## How To Use
32
+
33
+ To generate 10 random sentences starting from an empty string on a CUDA device:
34
+
35
+ ```python
36
+ from transformers import pipeline, set_seed
37
+
38
+ generator = pipeline('text-generation', model='agentlans/pythia-14m-sentences', device='cuda')
39
+
40
+ set_seed(1234)
41
+ results = generator("", max_length=100, num_return_sequences=10, do_sample=True)
42
+
43
+ for x in results:
44
+ print(x['generated_text'])
45
+ ```
46
+
47
+ Output:
48
+ ```text
49
+ The most common cause of the number of diseases is the common cause of death.
50
+ And there are many people in the war.
51
+ The average household income is 35.5 percent.
52
+ He was the most influential theologians of the country in this world.
53
+ On the other hand, the students will be able to learn the value of the current and the time.
54
+ However, the effect of the study would be greater than that of a drug-related drug drug.
55
+ To understand today, our nation's largest international commitment to the use of new technology and technology across the country.
56
+ On Sunday, the UK was first held in the state of the Australian, where a foreign trade union was used since the first year.
57
+ I've said that the program is most effective in education in the middle of the world.
58
+ So a year, it is important to identify a community where a student has a disability.
59
+ ```
60
+
61
+ To let the model continue the sentence:
62
+
63
+ ```python
64
+ results = generator("The meaning of life is", max_length=100, num_return_sequences=10, do_sample=True)
65
+ for x in results:
66
+ print(x['generated_text'])
67
+ ```
68
+
69
+ Output:
70
+ ```text
71
+ The meaning of life is one of the most extraordinary stories of the great world, and some of the most brilliant examples of the world of science.
72
+ The meaning of life is to develop.
73
+ The meaning of life is to the person, or to make it a personal impression of what is the case for the reader.
74
+ The meaning of life is no longer the most important concept of the human language.
75
+ The meaning of life is the form of a personal or personal character.
76
+ The meaning of life is the world's real and our future.
77
+ The meaning of life is the true one of the nation's largest historical experiences.
78
+ The meaning of life is the basis of the Church's first, the church of the Holy Spirit, and a living faith.
79
+ The meaning of life is that the law requires that the truth be lost.
80
+ The meaning of life is the best reason for the poor and poor economy.
81
+ ```
82
+
83
+ ## Training Procedure
84
+
85
+ The model was trained using the following hyperparameters:
86
+ - Learning rate: 5e-05
87
+ - Train batch size: 8
88
+ - Eval batch size: 8
89
+ - Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
90
+ - LR scheduler: Linear
91
+ - Number of epochs: 3.0
92
+
93
+ ## Evaluation Results
94
+
95
+ On the evaluation set, the model achieved:
96
+ - Loss: 6.2540
97
+ - Accuracy: 0.1776
98
+
99
+ ## Ethical Considerations
100
+
101
+ As with any text generation model, users should be aware of potential biases in the training data that may be reflected in the model's outputs. The model should not be used to generate or propagate harmful content.
102
+
103
+ ## Technical Specifications
104
+
105
+ - Library: Transformers 4.45.1
106
+ - Framework: PyTorch 2.4.1+cu121
107
+ - Datasets: 3.0.1
108
+ - Tokenizers: 0.20.0
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_accuracy": 0.1775901465932144,
4
+ "eval_loss": 6.25395393371582,
5
+ "eval_runtime": 26.5184,
6
+ "eval_samples": 4553,
7
+ "eval_samples_per_second": 171.692,
8
+ "eval_steps_per_second": 21.495,
9
+ "perplexity": 520.0650675835933,
10
+ "total_flos": 5764753863475200.0,
11
+ "train_loss": 6.457221655868903,
12
+ "train_runtime": 1165.3935,
13
+ "train_samples": 40997,
14
+ "train_samples_per_second": 105.536,
15
+ "train_steps_per_second": 13.193
16
+ }
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "agentlans/pythia-14m-finewebedu-sentences",
3
+ "architectures": [
4
+ "GPTNeoXForCausalLM"
5
+ ],
6
+ "attention_bias": true,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 0,
9
+ "classifier_dropout": 0.1,
10
+ "eos_token_id": 0,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout": 0.0,
13
+ "hidden_size": 128,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 512,
16
+ "layer_norm_eps": 1e-05,
17
+ "max_position_embeddings": 2048,
18
+ "model_type": "gpt_neox",
19
+ "num_attention_heads": 4,
20
+ "num_hidden_layers": 6,
21
+ "partial_rotary_factor": 0.25,
22
+ "rope_scaling": null,
23
+ "rope_theta": 10000,
24
+ "rotary_emb_base": 10000,
25
+ "rotary_pct": 0.25,
26
+ "tie_word_embeddings": false,
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.45.1",
29
+ "use_cache": true,
30
+ "use_parallel_residual": true,
31
+ "vocab_size": 50304
32
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_accuracy": 0.1775901465932144,
4
+ "eval_loss": 6.25395393371582,
5
+ "eval_runtime": 26.5184,
6
+ "eval_samples": 4553,
7
+ "eval_samples_per_second": 171.692,
8
+ "eval_steps_per_second": 21.495,
9
+ "perplexity": 520.0650675835933
10
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.45.1"
6
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d57815e9690ab850851289477ac83e03a76c877aea0f87ed07a16c3f13da5507
3
+ size 56279344
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": false,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<|endoftext|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<|padding|>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "50254": {
23
+ "content": " ",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": false
29
+ },
30
+ "50255": {
31
+ "content": " ",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": false
37
+ },
38
+ "50256": {
39
+ "content": " ",
40
+ "lstrip": false,
41
+ "normalized": true,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": false
45
+ },
46
+ "50257": {
47
+ "content": " ",
48
+ "lstrip": false,
49
+ "normalized": true,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": false
53
+ },
54
+ "50258": {
55
+ "content": " ",
56
+ "lstrip": false,
57
+ "normalized": true,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": false
61
+ },
62
+ "50259": {
63
+ "content": " ",
64
+ "lstrip": false,
65
+ "normalized": true,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": false
69
+ },
70
+ "50260": {
71
+ "content": " ",
72
+ "lstrip": false,
73
+ "normalized": true,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": false
77
+ },
78
+ "50261": {
79
+ "content": " ",
80
+ "lstrip": false,
81
+ "normalized": true,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": false
85
+ },
86
+ "50262": {
87
+ "content": " ",
88
+ "lstrip": false,
89
+ "normalized": true,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": false
93
+ },
94
+ "50263": {
95
+ "content": " ",
96
+ "lstrip": false,
97
+ "normalized": true,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": false
101
+ },
102
+ "50264": {
103
+ "content": " ",
104
+ "lstrip": false,
105
+ "normalized": true,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": false
109
+ },
110
+ "50265": {
111
+ "content": " ",
112
+ "lstrip": false,
113
+ "normalized": true,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": false
117
+ },
118
+ "50266": {
119
+ "content": " ",
120
+ "lstrip": false,
121
+ "normalized": true,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": false
125
+ },
126
+ "50267": {
127
+ "content": " ",
128
+ "lstrip": false,
129
+ "normalized": true,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": false
133
+ },
134
+ "50268": {
135
+ "content": " ",
136
+ "lstrip": false,
137
+ "normalized": true,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": false
141
+ },
142
+ "50269": {
143
+ "content": " ",
144
+ "lstrip": false,
145
+ "normalized": true,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": false
149
+ },
150
+ "50270": {
151
+ "content": " ",
152
+ "lstrip": false,
153
+ "normalized": true,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": false
157
+ },
158
+ "50271": {
159
+ "content": " ",
160
+ "lstrip": false,
161
+ "normalized": true,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": false
165
+ },
166
+ "50272": {
167
+ "content": " ",
168
+ "lstrip": false,
169
+ "normalized": true,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": false
173
+ },
174
+ "50273": {
175
+ "content": " ",
176
+ "lstrip": false,
177
+ "normalized": true,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": false
181
+ },
182
+ "50274": {
183
+ "content": " ",
184
+ "lstrip": false,
185
+ "normalized": true,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": false
189
+ },
190
+ "50275": {
191
+ "content": " ",
192
+ "lstrip": false,
193
+ "normalized": true,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": false
197
+ },
198
+ "50276": {
199
+ "content": " ",
200
+ "lstrip": false,
201
+ "normalized": true,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": false
205
+ }
206
+ },
207
+ "bos_token": "<|endoftext|>",
208
+ "clean_up_tokenization_spaces": true,
209
+ "eos_token": "<|endoftext|>",
210
+ "model_max_length": 1000000000000000019884624838656,
211
+ "pad_token": null,
212
+ "tokenizer_class": "GPTNeoXTokenizer",
213
+ "unk_token": "<|endoftext|>"
214
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 5764753863475200.0,
4
+ "train_loss": 6.457221655868903,
5
+ "train_runtime": 1165.3935,
6
+ "train_samples": 40997,
7
+ "train_samples_per_second": 105.536,
8
+ "train_steps_per_second": 13.193
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 3.0,
5
+ "eval_steps": 500,
6
+ "global_step": 15375,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0975609756097561,
13
+ "grad_norm": 19.442411422729492,
14
+ "learning_rate": 4.8373983739837406e-05,
15
+ "loss": 6.7559,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.1951219512195122,
20
+ "grad_norm": 22.672739028930664,
21
+ "learning_rate": 4.6747967479674795e-05,
22
+ "loss": 6.6932,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.2926829268292683,
27
+ "grad_norm": 21.795516967773438,
28
+ "learning_rate": 4.51219512195122e-05,
29
+ "loss": 6.6652,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.3902439024390244,
34
+ "grad_norm": 19.84381866455078,
35
+ "learning_rate": 4.3495934959349595e-05,
36
+ "loss": 6.6335,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.4878048780487805,
41
+ "grad_norm": 14.22912883758545,
42
+ "learning_rate": 4.186991869918699e-05,
43
+ "loss": 6.6248,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 0.5853658536585366,
48
+ "grad_norm": 14.391462326049805,
49
+ "learning_rate": 4.0243902439024395e-05,
50
+ "loss": 6.5929,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 0.6829268292682927,
55
+ "grad_norm": 19.81720733642578,
56
+ "learning_rate": 3.861788617886179e-05,
57
+ "loss": 6.5589,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 0.7804878048780488,
62
+ "grad_norm": 15.33761978149414,
63
+ "learning_rate": 3.699186991869919e-05,
64
+ "loss": 6.5327,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 0.8780487804878049,
69
+ "grad_norm": 14.190281867980957,
70
+ "learning_rate": 3.5365853658536584e-05,
71
+ "loss": 6.5175,
72
+ "step": 4500
73
+ },
74
+ {
75
+ "epoch": 0.975609756097561,
76
+ "grad_norm": 16.57828712463379,
77
+ "learning_rate": 3.373983739837399e-05,
78
+ "loss": 6.5137,
79
+ "step": 5000
80
+ },
81
+ {
82
+ "epoch": 1.0731707317073171,
83
+ "grad_norm": 16.75761604309082,
84
+ "learning_rate": 3.2113821138211384e-05,
85
+ "loss": 6.495,
86
+ "step": 5500
87
+ },
88
+ {
89
+ "epoch": 1.170731707317073,
90
+ "grad_norm": 18.840726852416992,
91
+ "learning_rate": 3.048780487804878e-05,
92
+ "loss": 6.4757,
93
+ "step": 6000
94
+ },
95
+ {
96
+ "epoch": 1.2682926829268293,
97
+ "grad_norm": 17.630483627319336,
98
+ "learning_rate": 2.886178861788618e-05,
99
+ "loss": 6.4633,
100
+ "step": 6500
101
+ },
102
+ {
103
+ "epoch": 1.3658536585365852,
104
+ "grad_norm": 16.721818923950195,
105
+ "learning_rate": 2.7235772357723577e-05,
106
+ "loss": 6.4462,
107
+ "step": 7000
108
+ },
109
+ {
110
+ "epoch": 1.4634146341463414,
111
+ "grad_norm": 14.650636672973633,
112
+ "learning_rate": 2.5609756097560977e-05,
113
+ "loss": 6.4404,
114
+ "step": 7500
115
+ },
116
+ {
117
+ "epoch": 1.5609756097560976,
118
+ "grad_norm": 13.825970649719238,
119
+ "learning_rate": 2.3983739837398377e-05,
120
+ "loss": 6.4326,
121
+ "step": 8000
122
+ },
123
+ {
124
+ "epoch": 1.6585365853658538,
125
+ "grad_norm": 11.85326862335205,
126
+ "learning_rate": 2.2357723577235773e-05,
127
+ "loss": 6.4239,
128
+ "step": 8500
129
+ },
130
+ {
131
+ "epoch": 1.7560975609756098,
132
+ "grad_norm": 13.92196273803711,
133
+ "learning_rate": 2.073170731707317e-05,
134
+ "loss": 6.4098,
135
+ "step": 9000
136
+ },
137
+ {
138
+ "epoch": 1.8536585365853657,
139
+ "grad_norm": 12.077308654785156,
140
+ "learning_rate": 1.9105691056910573e-05,
141
+ "loss": 6.3987,
142
+ "step": 9500
143
+ },
144
+ {
145
+ "epoch": 1.951219512195122,
146
+ "grad_norm": 12.406614303588867,
147
+ "learning_rate": 1.747967479674797e-05,
148
+ "loss": 6.3957,
149
+ "step": 10000
150
+ },
151
+ {
152
+ "epoch": 2.048780487804878,
153
+ "grad_norm": 14.001736640930176,
154
+ "learning_rate": 1.5853658536585366e-05,
155
+ "loss": 6.3752,
156
+ "step": 10500
157
+ },
158
+ {
159
+ "epoch": 2.1463414634146343,
160
+ "grad_norm": 12.691810607910156,
161
+ "learning_rate": 1.4227642276422764e-05,
162
+ "loss": 6.3566,
163
+ "step": 11000
164
+ },
165
+ {
166
+ "epoch": 2.2439024390243905,
167
+ "grad_norm": 10.062420845031738,
168
+ "learning_rate": 1.2601626016260162e-05,
169
+ "loss": 6.3492,
170
+ "step": 11500
171
+ },
172
+ {
173
+ "epoch": 2.341463414634146,
174
+ "grad_norm": 11.78906536102295,
175
+ "learning_rate": 1.0975609756097562e-05,
176
+ "loss": 6.3447,
177
+ "step": 12000
178
+ },
179
+ {
180
+ "epoch": 2.4390243902439024,
181
+ "grad_norm": 13.368131637573242,
182
+ "learning_rate": 9.34959349593496e-06,
183
+ "loss": 6.339,
184
+ "step": 12500
185
+ },
186
+ {
187
+ "epoch": 2.5365853658536586,
188
+ "grad_norm": 12.125652313232422,
189
+ "learning_rate": 7.723577235772358e-06,
190
+ "loss": 6.3305,
191
+ "step": 13000
192
+ },
193
+ {
194
+ "epoch": 2.6341463414634148,
195
+ "grad_norm": 13.748695373535156,
196
+ "learning_rate": 6.0975609756097564e-06,
197
+ "loss": 6.3205,
198
+ "step": 13500
199
+ },
200
+ {
201
+ "epoch": 2.7317073170731705,
202
+ "grad_norm": 13.787367820739746,
203
+ "learning_rate": 4.471544715447155e-06,
204
+ "loss": 6.3196,
205
+ "step": 14000
206
+ },
207
+ {
208
+ "epoch": 2.8292682926829267,
209
+ "grad_norm": 15.013029098510742,
210
+ "learning_rate": 2.8455284552845528e-06,
211
+ "loss": 6.3116,
212
+ "step": 14500
213
+ },
214
+ {
215
+ "epoch": 2.926829268292683,
216
+ "grad_norm": 15.244904518127441,
217
+ "learning_rate": 1.2195121951219514e-06,
218
+ "loss": 6.3107,
219
+ "step": 15000
220
+ },
221
+ {
222
+ "epoch": 3.0,
223
+ "step": 15375,
224
+ "total_flos": 5764753863475200.0,
225
+ "train_loss": 6.457221655868903,
226
+ "train_runtime": 1165.3935,
227
+ "train_samples_per_second": 105.536,
228
+ "train_steps_per_second": 13.193
229
+ }
230
+ ],
231
+ "logging_steps": 500,
232
+ "max_steps": 15375,
233
+ "num_input_tokens_seen": 0,
234
+ "num_train_epochs": 3,
235
+ "save_steps": 500,
236
+ "stateful_callbacks": {
237
+ "TrainerControl": {
238
+ "args": {
239
+ "should_epoch_stop": false,
240
+ "should_evaluate": false,
241
+ "should_log": false,
242
+ "should_save": true,
243
+ "should_training_stop": true
244
+ },
245
+ "attributes": {}
246
+ }
247
+ },
248
+ "total_flos": 5764753863475200.0,
249
+ "train_batch_size": 8,
250
+ "trial_name": null,
251
+ "trial_params": null
252
+ }