michaeldinzinger commited on
Commit
9d8b836
·
1 Parent(s): f1de4b4
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,883 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:502912
8
+ - loss:MarginMSELoss
9
+ base_model: FacebookAI/xlm-roberta-base
10
+ widget:
11
+ - source_sentence: how to get rid of an iron mark
12
+ sentences:
13
+ - Quick Answer. A good remedy for removing shiny iron scorch marks from fabric is
14
+ to use hydrogen peroxide with ammonia. Other options for removing shiny scorch
15
+ marks include laundry detergent, bleach or vinegar, but it depends on how quickly
16
+ the scorch is remedied. Keep Learning.
17
+ - Largely due to declining sales, in 2006, Tommy Hilfiger sold his company for $1.6
18
+ billion, or $16.80 a share, to Apax Partners, a private investment company. In
19
+ March 2010, Phillips-Van Heusen, owner of Calvin Klein, bought the Tommy Hilfiger
20
+ Corporation for $3 billion.
21
+ - You need to heat continuous until it turns to a paste. Use this simple mixture
22
+ by rubbing it right onto the soleplate. Now, make sure that the iron is unplugged
23
+ before cleaning it. After rubbing the mixture, with the help of a nice, clean
24
+ cloth wipe the unsightly scorch marks off your iron. 5 people found this useful.
25
+ - source_sentence: how much does ipl facial cost
26
+ sentences:
27
+ - "Prices usually vary according to the ipl treatment size. The average cost for\
28
+ \ a FotoFacial/IPL is $350 - $600 each treatment, depending on the body part.A\
29
+ \ consultation with a fotofacial specialist and the number of treatments needed\
30
+ \ will determine your ipl treatment cost.s discussed above, IPL does not damage\
31
+ \ the skin surface, unlike dermabrasion and laser resurfacing. Therefore, there\
32
+ \ is virtually no recovery time.â\x80\x9D Treatments take approximately 30-45\
33
+ \ minutes. Patients can apply makeup before leaving the office and return to work\
34
+ \ the same day."
35
+ - "â\x80\x94N.I., South Kingstown, Rhode IslandGenerally, the oven temperature will\
36
+ \ not need to be adjusted when baking mini or jumbo muffins, but the baking time\
37
+ \ will most likely need to be altered. Mini muffins will take anywhere from 10\
38
+ \ to 15 minutes while jumbo muffins will bake from 20 to 40 minutes.Check jumbo\
39
+ \ muffins for doneness after 20 minutes, then every 5 to 10 minutes. Keep in mind\
40
+ \ that the baking time will vary according to the recipe.The variation is due\
41
+ \ to the oven temperature and the amount of batter in each muffin cup.heck jumbo\
42
+ \ muffins for doneness after 20 minutes, then every 5 to 10 minutes. Keep in mind\
43
+ \ that the baking time will vary according to the recipe. The variation is due\
44
+ \ to the oven temperature and the amount of batter in each muffin cup."
45
+ - Prices usually vary according to the ipl treatment size. The average cost for
46
+ a FotoFacial/IPL is $350 - $600 each treatment, depending on the body part.A consultation
47
+ with a fotofacial specialist and the number of treatments needed will determine
48
+ your ipl treatment cost.PL, which stands for intensed pulsed light, is non-ablative
49
+ meaning that is does not damage the surface of the skin. The intense light is
50
+ delivered to the deeper parts of the skin (dermis) and leaves the superficial
51
+ aspect of the skin (epidermis) untouched.
52
+ - source_sentence: who voiced scooby doo
53
+ sentences:
54
+ - Don Messick originated the voice of Scooby-Doo, and was the voice of the character
55
+ for over 25 years until his retirement from voice acting in 1996 (he subsequently
56
+ passed away the following year).rank Welker's Scooby-Doo voice is pretty much
57
+ identical to the voice he used for Brain on Inspector Gadget.. Hadley Kay and
58
+ Neil Fanning are the worst, IMO...
59
+ - because als symptoms include fatigue muscle weakness and muscle twitches early
60
+ on it can look like other very treatable illnesses one that commonly comes up
61
+ is lyme disease an infectious disease resulting from a tick bite unlike als lyme
62
+ is usually treatable with antibiotics lyme disease does not cause als and generally
63
+ in a diagnostic workup a neurologist can easily separate als from lyme infections
64
+ either clinically or with testing
65
+ - "Curse Of The Lake Monster while Frank Welker voices him. 1 Don Messick (1969â\x80\
66
+ \x931996) 2 Hadley Kay (Johnny Bravo) 3 Scott Innes (1998â\x80\x932001) Frank\
67
+ \ Welker (2002â\x80\x93present plus Scooby- 1 Doo! Neil Fanning (2002 and 2004\
68
+ \ live-action films) Dave Coulier(2005 in Robot 1 Chicken) In Denmark, Scooby\
69
+ \ Doo is voiced by Lars Thiesgaard."
70
+ - source_sentence: track lighting that can be mounted on wall
71
+ sentences:
72
+ - Madison is one of 14 Community Plan areas in the Metro Nashville-Davidson County
73
+ area for which zoning and land use planning is done. The 2015-updated Community
74
+ Plan for Madison, an 89-page document adopted by the Metropolitan Planning Commission,
75
+ was updated in 2015 as part of NashvilleNext's long-term planning.
76
+ - This three-light plug-in LED track kit can be surface-mounted anywhere in a room
77
+ as the power feed cord eliminates the need for a ...junction box. Quick and easy
78
+ to install, it features three 12 watt LED bullet heads that pivot in a cradle
79
+ and produce a spotlight beam of energy-saving light.
80
+ - This white finish, three-light LED track kit can be surface-mounted or suspended
81
+ from the ceiling with pendant lighting accessorie...s. A power feed cord and plug
82
+ lets you install it easily without the need for a junction box. Arrange the three,...
83
+ - source_sentence: Most Common Apple Varieties
84
+ sentences:
85
+ - "Well, rest easy, because this condensed list of the 18 most popular apple varieties\
86
+ \ breaks down the information every apple eater should knowâ\x80\x94how to cook\
87
+ \ them, best recipes, and when they are in season. Red Delicious: A popular eating\
88
+ \ apple that looks just how we all imagine an apple should."
89
+ - Here's a look at the top 50 draft-eligible prospects for next year, led by quarterback
90
+ Connor Cook, defensive lineman Joey Bosa, cornerback Vernon Hargreaves, and offensive
91
+ tackle Ronnie Stanley and Laremy Tunsil.
92
+ - The most popular apple varieties are Cortland, Red Delicious, Golden Delicious,
93
+ Empire, Fuji, Gala, Ida Red, Macoun, McIntosh, Northern Spy, and Winesap. Olwen
94
+ Woodier also offers descriptions for an additional 20 varieties of apples in this
95
+ very useful and informative cookbook. Cortland.
96
+ pipeline_tag: sentence-similarity
97
+ library_name: sentence-transformers
98
+ ---
99
+
100
+ # SentenceTransformer based on FacebookAI/xlm-roberta-base
101
+
102
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
103
+
104
+ ## Model Details
105
+
106
+ ### Model Description
107
+ - **Model Type:** Sentence Transformer
108
+ - **Base model:** [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) <!-- at revision e73636d4f797dec63c3081bb6ed5c7b0bb3f2089 -->
109
+ - **Maximum Sequence Length:** 512 tokens
110
+ - **Output Dimensionality:** 768 dimensions
111
+ - **Similarity Function:** Cosine Similarity
112
+ <!-- - **Training Dataset:** Unknown -->
113
+ <!-- - **Language:** Unknown -->
114
+ <!-- - **License:** Unknown -->
115
+
116
+ ### Model Sources
117
+
118
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
119
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
120
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
121
+
122
+ ### Full Model Architecture
123
+
124
+ ```
125
+ SentenceTransformer(
126
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
127
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
128
+ )
129
+ ```
130
+
131
+ ## Usage
132
+
133
+ ### Direct Usage (Sentence Transformers)
134
+
135
+ First install the Sentence Transformers library:
136
+
137
+ ```bash
138
+ pip install -U sentence-transformers
139
+ ```
140
+
141
+ Then you can load this model and run inference.
142
+ ```python
143
+ from sentence_transformers import SentenceTransformer
144
+
145
+ # Download from the 🤗 Hub
146
+ model = SentenceTransformer("sentence_transformers_model_id")
147
+ # Run inference
148
+ sentences = [
149
+ 'Most Common Apple Varieties',
150
+ 'The most popular apple varieties are Cortland, Red Delicious, Golden Delicious, Empire, Fuji, Gala, Ida Red, Macoun, McIntosh, Northern Spy, and Winesap. Olwen Woodier also offers descriptions for an additional 20 varieties of apples in this very useful and informative cookbook. Cortland.',
151
+ 'Well, rest easy, because this condensed list of the 18 most popular apple varieties breaks down the information every apple eater should knowâ\x80\x94how to cook them, best recipes, and when they are in season. Red Delicious: A popular eating apple that looks just how we all imagine an apple should.',
152
+ ]
153
+ embeddings = model.encode(sentences)
154
+ print(embeddings.shape)
155
+ # [3, 768]
156
+
157
+ # Get the similarity scores for the embeddings
158
+ similarities = model.similarity(embeddings, embeddings)
159
+ print(similarities.shape)
160
+ # [3, 3]
161
+ ```
162
+
163
+ <!--
164
+ ### Direct Usage (Transformers)
165
+
166
+ <details><summary>Click to see the direct usage in Transformers</summary>
167
+
168
+ </details>
169
+ -->
170
+
171
+ <!--
172
+ ### Downstream Usage (Sentence Transformers)
173
+
174
+ You can finetune this model on your own dataset.
175
+
176
+ <details><summary>Click to expand</summary>
177
+
178
+ </details>
179
+ -->
180
+
181
+ <!--
182
+ ### Out-of-Scope Use
183
+
184
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
185
+ -->
186
+
187
+ <!--
188
+ ## Bias, Risks and Limitations
189
+
190
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
191
+ -->
192
+
193
+ <!--
194
+ ### Recommendations
195
+
196
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
197
+ -->
198
+
199
+ ## Training Details
200
+
201
+ ### Training Dataset
202
+
203
+ #### Unnamed Dataset
204
+
205
+ * Size: 502,912 training samples
206
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, <code>sentence_2</code>, and <code>label</code>
207
+ * Approximate statistics based on the first 1000 samples:
208
+ | | sentence_0 | sentence_1 | sentence_2 | label |
209
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------------|
210
+ | type | string | string | string | float |
211
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.88 tokens</li><li>max: 59 tokens</li></ul> | <ul><li>min: 19 tokens</li><li>mean: 88.5 tokens</li><li>max: 232 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 87.87 tokens</li><li>max: 282 tokens</li></ul> | <ul><li>min: -16.56</li><li>mean: 0.96</li><li>max: 20.84</li></ul> |
212
+ * Samples:
213
+ | sentence_0 | sentence_1 | sentence_2 | label |
214
+ |:------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------|
215
+ | <code>how long are bank issued checks good for</code> | <code>Your mom is correct....most checks are good for anywhere between 180 days up to 1 year. Sorry, but you probably won't be able to cash those checks, although it never hurts to check with your bank on the issue. DH · 9 years ago.</code> | <code>Non-local personal and business checks. If the check is from a bank in a different federal reserve district than the depositing bank, it can be held for 5 business days under normal circumstances. Exceptions for new customers during the first 30 days. Banks are not required to give next day ability on the first $100 of deposits, and both local and non-local personal and business checks can be held for a maximum of 11 business days.</code> | <code>2.6526598930358887</code> |
216
+ | <code>11:11 meaning</code> | <code>11-11-11 11:11:11 example. 11-11 11:11 example. Numerologists believe that events linked to the time 11:11 appear more often than can be explained by chance or coincidence. This belief is related to the concept of synchronicity. Some authors claim that seeing 11:11 on a clock is an auspicious sign.</code> | <code>Sometimes it's difficult to describe what seeing the 11:11 means, because it is a personal experience for everyone. If you feel you are having these experiences for a reason, then it might be that only you will know what these number prompts and wake-up calls mean.</code> | <code>-1.3284940719604492</code> |
217
+ | <code>did someone from pawn stars die</code> | <code>Did someone from pawn stars on history channel die? kgb answers » Arts & Entertainment » Actors and Actresses » Did someone from pawn stars on history channel die? None from the actors & cast of Pawn Stars died. There was a rumor that Leonard Shaffer, a coin expert, died but it is not true. He is alive & well. Tags: pawn stars, lists of actors.</code> | <code>Austin Russell, also known as Chumlee, star of History's reality series Pawn Stars, has died from an apparent heart attack, sources confirm to eBuzzd.</code> | <code>1.7131614685058594</code> |
218
+ * Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#marginmseloss)
219
+
220
+ ### Training Hyperparameters
221
+ #### Non-Default Hyperparameters
222
+
223
+ - `per_device_train_batch_size`: 64
224
+ - `per_device_eval_batch_size`: 64
225
+ - `num_train_epochs`: 30
226
+ - `fp16`: True
227
+ - `multi_dataset_batch_sampler`: round_robin
228
+
229
+ #### All Hyperparameters
230
+ <details><summary>Click to expand</summary>
231
+
232
+ - `overwrite_output_dir`: False
233
+ - `do_predict`: False
234
+ - `eval_strategy`: no
235
+ - `prediction_loss_only`: True
236
+ - `per_device_train_batch_size`: 64
237
+ - `per_device_eval_batch_size`: 64
238
+ - `per_gpu_train_batch_size`: None
239
+ - `per_gpu_eval_batch_size`: None
240
+ - `gradient_accumulation_steps`: 1
241
+ - `eval_accumulation_steps`: None
242
+ - `torch_empty_cache_steps`: None
243
+ - `learning_rate`: 5e-05
244
+ - `weight_decay`: 0.0
245
+ - `adam_beta1`: 0.9
246
+ - `adam_beta2`: 0.999
247
+ - `adam_epsilon`: 1e-08
248
+ - `max_grad_norm`: 1
249
+ - `num_train_epochs`: 30
250
+ - `max_steps`: -1
251
+ - `lr_scheduler_type`: linear
252
+ - `lr_scheduler_kwargs`: {}
253
+ - `warmup_ratio`: 0.0
254
+ - `warmup_steps`: 0
255
+ - `log_level`: passive
256
+ - `log_level_replica`: warning
257
+ - `log_on_each_node`: True
258
+ - `logging_nan_inf_filter`: True
259
+ - `save_safetensors`: True
260
+ - `save_on_each_node`: False
261
+ - `save_only_model`: False
262
+ - `restore_callback_states_from_checkpoint`: False
263
+ - `no_cuda`: False
264
+ - `use_cpu`: False
265
+ - `use_mps_device`: False
266
+ - `seed`: 42
267
+ - `data_seed`: None
268
+ - `jit_mode_eval`: False
269
+ - `use_ipex`: False
270
+ - `bf16`: False
271
+ - `fp16`: True
272
+ - `fp16_opt_level`: O1
273
+ - `half_precision_backend`: auto
274
+ - `bf16_full_eval`: False
275
+ - `fp16_full_eval`: False
276
+ - `tf32`: None
277
+ - `local_rank`: 0
278
+ - `ddp_backend`: None
279
+ - `tpu_num_cores`: None
280
+ - `tpu_metrics_debug`: False
281
+ - `debug`: []
282
+ - `dataloader_drop_last`: False
283
+ - `dataloader_num_workers`: 0
284
+ - `dataloader_prefetch_factor`: None
285
+ - `past_index`: -1
286
+ - `disable_tqdm`: False
287
+ - `remove_unused_columns`: True
288
+ - `label_names`: None
289
+ - `load_best_model_at_end`: False
290
+ - `ignore_data_skip`: False
291
+ - `fsdp`: []
292
+ - `fsdp_min_num_params`: 0
293
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
294
+ - `fsdp_transformer_layer_cls_to_wrap`: None
295
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
296
+ - `deepspeed`: None
297
+ - `label_smoothing_factor`: 0.0
298
+ - `optim`: adamw_torch
299
+ - `optim_args`: None
300
+ - `adafactor`: False
301
+ - `group_by_length`: False
302
+ - `length_column_name`: length
303
+ - `ddp_find_unused_parameters`: None
304
+ - `ddp_bucket_cap_mb`: None
305
+ - `ddp_broadcast_buffers`: False
306
+ - `dataloader_pin_memory`: True
307
+ - `dataloader_persistent_workers`: False
308
+ - `skip_memory_metrics`: True
309
+ - `use_legacy_prediction_loop`: False
310
+ - `push_to_hub`: False
311
+ - `resume_from_checkpoint`: None
312
+ - `hub_model_id`: None
313
+ - `hub_strategy`: every_save
314
+ - `hub_private_repo`: None
315
+ - `hub_always_push`: False
316
+ - `gradient_checkpointing`: False
317
+ - `gradient_checkpointing_kwargs`: None
318
+ - `include_inputs_for_metrics`: False
319
+ - `include_for_metrics`: []
320
+ - `eval_do_concat_batches`: True
321
+ - `fp16_backend`: auto
322
+ - `push_to_hub_model_id`: None
323
+ - `push_to_hub_organization`: None
324
+ - `mp_parameters`:
325
+ - `auto_find_batch_size`: False
326
+ - `full_determinism`: False
327
+ - `torchdynamo`: None
328
+ - `ray_scope`: last
329
+ - `ddp_timeout`: 1800
330
+ - `torch_compile`: False
331
+ - `torch_compile_backend`: None
332
+ - `torch_compile_mode`: None
333
+ - `dispatch_batches`: None
334
+ - `split_batches`: None
335
+ - `include_tokens_per_second`: False
336
+ - `include_num_input_tokens_seen`: False
337
+ - `neftune_noise_alpha`: None
338
+ - `optim_target_modules`: None
339
+ - `batch_eval_metrics`: False
340
+ - `eval_on_start`: False
341
+ - `use_liger_kernel`: False
342
+ - `eval_use_gather_object`: False
343
+ - `average_tokens_across_devices`: False
344
+ - `prompts`: None
345
+ - `batch_sampler`: batch_sampler
346
+ - `multi_dataset_batch_sampler`: round_robin
347
+
348
+ </details>
349
+
350
+ ### Training Logs
351
+ <details><summary>Click to expand</summary>
352
+
353
+ | Epoch | Step | Training Loss |
354
+ |:-------:|:------:|:-------------:|
355
+ | 0.0636 | 500 | 92.5416 |
356
+ | 0.1273 | 1000 | 20.6659 |
357
+ | 0.1909 | 1500 | 14.7631 |
358
+ | 0.2545 | 2000 | 14.3025 |
359
+ | 0.3181 | 2500 | 13.5257 |
360
+ | 0.3818 | 3000 | 12.8666 |
361
+ | 0.4454 | 3500 | 12.397 |
362
+ | 0.5090 | 4000 | 12.2718 |
363
+ | 0.5727 | 4500 | 11.539 |
364
+ | 0.6363 | 5000 | 11.1145 |
365
+ | 0.6999 | 5500 | 11.1232 |
366
+ | 0.7636 | 6000 | 10.6021 |
367
+ | 0.8272 | 6500 | 10.4115 |
368
+ | 0.8908 | 7000 | 10.4529 |
369
+ | 0.9544 | 7500 | 10.1329 |
370
+ | 1.0181 | 8000 | 10.1367 |
371
+ | 1.0817 | 8500 | 9.5914 |
372
+ | 1.1453 | 9000 | 9.2799 |
373
+ | 1.2090 | 9500 | 9.266 |
374
+ | 1.2726 | 10000 | 9.1661 |
375
+ | 1.3362 | 10500 | 8.954 |
376
+ | 1.3998 | 11000 | 8.9562 |
377
+ | 1.4635 | 11500 | 9.4717 |
378
+ | 1.5271 | 12000 | 8.6758 |
379
+ | 1.5907 | 12500 | 8.87 |
380
+ | 1.6544 | 13000 | 8.5826 |
381
+ | 1.7180 | 13500 | 8.4827 |
382
+ | 1.7816 | 14000 | 8.5306 |
383
+ | 1.8453 | 14500 | 8.182 |
384
+ | 1.9089 | 15000 | 8.3592 |
385
+ | 1.9725 | 15500 | 8.3879 |
386
+ | 2.0361 | 16000 | 7.4399 |
387
+ | 2.0998 | 16500 | 7.0406 |
388
+ | 2.1634 | 17000 | 6.89 |
389
+ | 2.2270 | 17500 | 6.8651 |
390
+ | 2.2907 | 18000 | 6.8461 |
391
+ | 2.3543 | 18500 | 6.7663 |
392
+ | 2.4179 | 19000 | 6.9313 |
393
+ | 2.4815 | 19500 | 6.9688 |
394
+ | 2.5452 | 20000 | 6.7821 |
395
+ | 2.6088 | 20500 | 6.9468 |
396
+ | 2.6724 | 21000 | 6.731 |
397
+ | 2.7361 | 21500 | 6.649 |
398
+ | 2.7997 | 22000 | 6.7055 |
399
+ | 2.8633 | 22500 | 6.7744 |
400
+ | 2.9270 | 23000 | 6.9481 |
401
+ | 2.9906 | 23500 | 6.5967 |
402
+ | 3.0542 | 24000 | 5.7351 |
403
+ | 3.1178 | 24500 | 5.4125 |
404
+ | 3.1815 | 25000 | 5.4095 |
405
+ | 3.2451 | 25500 | 5.4253 |
406
+ | 3.3087 | 26000 | 5.3774 |
407
+ | 3.3724 | 26500 | 5.5277 |
408
+ | 3.4360 | 27000 | 5.4516 |
409
+ | 3.4996 | 27500 | 5.322 |
410
+ | 3.5632 | 28000 | 5.5531 |
411
+ | 3.6269 | 28500 | 5.5238 |
412
+ | 3.6905 | 29000 | 5.5992 |
413
+ | 3.7541 | 29500 | 5.5351 |
414
+ | 3.8178 | 30000 | 5.3985 |
415
+ | 3.8814 | 30500 | 5.4313 |
416
+ | 3.9450 | 31000 | 5.4173 |
417
+ | 4.0087 | 31500 | 5.2333 |
418
+ | 4.0723 | 32000 | 4.3352 |
419
+ | 4.1359 | 32500 | 4.3442 |
420
+ | 4.1995 | 33000 | 4.3288 |
421
+ | 4.2632 | 33500 | 4.367 |
422
+ | 4.3268 | 34000 | 4.4607 |
423
+ | 4.3904 | 34500 | 4.4461 |
424
+ | 4.4541 | 35000 | 4.6218 |
425
+ | 4.5177 | 35500 | 4.4249 |
426
+ | 4.5813 | 36000 | 4.4129 |
427
+ | 4.6449 | 36500 | 4.4065 |
428
+ | 4.7086 | 37000 | 4.5452 |
429
+ | 4.7722 | 37500 | 4.5411 |
430
+ | 4.8358 | 38000 | 4.5423 |
431
+ | 4.8995 | 38500 | 4.4942 |
432
+ | 4.9631 | 39000 | 4.5332 |
433
+ | 5.0267 | 39500 | 4.0759 |
434
+ | 5.0904 | 40000 | 3.6274 |
435
+ | 5.1540 | 40500 | 3.6795 |
436
+ | 5.2176 | 41000 | 3.6741 |
437
+ | 5.2812 | 41500 | 3.7396 |
438
+ | 5.3449 | 42000 | 3.6839 |
439
+ | 5.4085 | 42500 | 3.732 |
440
+ | 5.4721 | 43000 | 3.6557 |
441
+ | 5.5358 | 43500 | 3.6925 |
442
+ | 5.5994 | 44000 | 3.7149 |
443
+ | 5.6630 | 44500 | 3.6744 |
444
+ | 5.7266 | 45000 | 3.7669 |
445
+ | 5.7903 | 45500 | 3.651 |
446
+ | 5.8539 | 46000 | 3.721 |
447
+ | 5.9175 | 46500 | 3.7012 |
448
+ | 5.9812 | 47000 | 3.7294 |
449
+ | 6.0448 | 47500 | 3.2432 |
450
+ | 6.1084 | 48000 | 3.0295 |
451
+ | 6.1721 | 48500 | 3.0364 |
452
+ | 6.2357 | 49000 | 3.0687 |
453
+ | 6.2993 | 49500 | 3.064 |
454
+ | 6.3629 | 50000 | 3.112 |
455
+ | 6.4266 | 50500 | 3.1438 |
456
+ | 6.4902 | 51000 | 3.0733 |
457
+ | 6.5538 | 51500 | 3.1719 |
458
+ | 6.6175 | 52000 | 3.1355 |
459
+ | 6.6811 | 52500 | 3.1612 |
460
+ | 6.7447 | 53000 | 3.1938 |
461
+ | 6.8083 | 53500 | 3.1375 |
462
+ | 6.8720 | 54000 | 3.1969 |
463
+ | 6.9356 | 54500 | 3.2214 |
464
+ | 6.9992 | 55000 | 3.1364 |
465
+ | 7.0629 | 55500 | 2.63 |
466
+ | 7.1265 | 56000 | 2.5451 |
467
+ | 7.1901 | 56500 | 2.644 |
468
+ | 7.2538 | 57000 | 2.6482 |
469
+ | 7.3174 | 57500 | 2.6017 |
470
+ | 7.3810 | 58000 | 2.6626 |
471
+ | 7.4446 | 58500 | 2.6698 |
472
+ | 7.5083 | 59000 | 2.6595 |
473
+ | 7.5719 | 59500 | 2.6683 |
474
+ | 7.6355 | 60000 | 2.7187 |
475
+ | 7.6992 | 60500 | 2.6213 |
476
+ | 7.7628 | 61000 | 2.7119 |
477
+ | 7.8264 | 61500 | 2.739 |
478
+ | 7.8900 | 62000 | 2.686 |
479
+ | 7.9537 | 62500 | 2.7295 |
480
+ | 8.0173 | 63000 | 2.6062 |
481
+ | 8.0809 | 63500 | 2.2272 |
482
+ | 8.1446 | 64000 | 2.2692 |
483
+ | 8.2082 | 64500 | 2.3135 |
484
+ | 8.2718 | 65000 | 2.2546 |
485
+ | 8.3355 | 65500 | 2.2882 |
486
+ | 8.3991 | 66000 | 2.2749 |
487
+ | 8.4627 | 66500 | 2.363 |
488
+ | 8.5263 | 67000 | 2.2923 |
489
+ | 8.5900 | 67500 | 2.3275 |
490
+ | 8.6536 | 68000 | 2.3738 |
491
+ | 8.7172 | 68500 | 2.3416 |
492
+ | 8.7809 | 69000 | 2.3851 |
493
+ | 8.8445 | 69500 | 2.3356 |
494
+ | 8.9081 | 70000 | 2.3598 |
495
+ | 8.9717 | 70500 | 2.4272 |
496
+ | 9.0354 | 71000 | 2.141 |
497
+ | 9.0990 | 71500 | 2.001 |
498
+ | 9.1626 | 72000 | 2.014 |
499
+ | 9.2263 | 72500 | 1.9826 |
500
+ | 9.2899 | 73000 | 1.995 |
501
+ | 9.3535 | 73500 | 2.0097 |
502
+ | 9.4172 | 74000 | 2.0412 |
503
+ | 9.4808 | 74500 | 2.0144 |
504
+ | 9.5444 | 75000 | 2.0653 |
505
+ | 9.6080 | 75500 | 2.022 |
506
+ | 9.6717 | 76000 | 2.0327 |
507
+ | 9.7353 | 76500 | 2.0596 |
508
+ | 9.7989 | 77000 | 2.0761 |
509
+ | 9.8626 | 77500 | 2.1245 |
510
+ | 9.9262 | 78000 | 2.1062 |
511
+ | 9.9898 | 78500 | 2.1186 |
512
+ | 10.0534 | 79000 | 1.8283 |
513
+ | 10.1171 | 79500 | 1.7627 |
514
+ | 10.1807 | 80000 | 1.7775 |
515
+ | 10.2443 | 80500 | 1.7865 |
516
+ | 10.3080 | 81000 | 1.8018 |
517
+ | 10.3716 | 81500 | 1.7851 |
518
+ | 10.4352 | 82000 | 1.8085 |
519
+ | 10.4989 | 82500 | 1.8293 |
520
+ | 10.5625 | 83000 | 1.8549 |
521
+ | 10.6261 | 83500 | 1.8531 |
522
+ | 10.6897 | 84000 | 1.8538 |
523
+ | 10.7534 | 84500 | 1.8814 |
524
+ | 10.8170 | 85000 | 1.8576 |
525
+ | 10.8806 | 85500 | 1.8516 |
526
+ | 10.9443 | 86000 | 1.8555 |
527
+ | 11.0079 | 86500 | 1.8631 |
528
+ | 11.0715 | 87000 | 1.6189 |
529
+ | 11.1351 | 87500 | 1.6143 |
530
+ | 11.1988 | 88000 | 1.6246 |
531
+ | 11.2624 | 88500 | 1.5997 |
532
+ | 11.3260 | 89000 | 1.646 |
533
+ | 11.3897 | 89500 | 1.6323 |
534
+ | 11.4533 | 90000 | 1.6623 |
535
+ | 11.5169 | 90500 | 1.6544 |
536
+ | 11.5806 | 91000 | 1.6671 |
537
+ | 11.6442 | 91500 | 1.6742 |
538
+ | 11.7078 | 92000 | 1.6409 |
539
+ | 11.7714 | 92500 | 1.6504 |
540
+ | 11.8351 | 93000 | 1.6791 |
541
+ | 11.8987 | 93500 | 1.6923 |
542
+ | 11.9623 | 94000 | 1.697 |
543
+ | 12.0260 | 94500 | 1.6136 |
544
+ | 12.0896 | 95000 | 1.4437 |
545
+ | 12.1532 | 95500 | 1.49 |
546
+ | 12.2168 | 96000 | 1.4567 |
547
+ | 12.2805 | 96500 | 1.5007 |
548
+ | 12.3441 | 97000 | 1.4826 |
549
+ | 12.4077 | 97500 | 1.4668 |
550
+ | 12.4714 | 98000 | 1.5009 |
551
+ | 12.5350 | 98500 | 1.5008 |
552
+ | 12.5986 | 99000 | 1.5336 |
553
+ | 12.6623 | 99500 | 1.5057 |
554
+ | 12.7259 | 100000 | 1.5081 |
555
+ | 12.7895 | 100500 | 1.5402 |
556
+ | 12.8531 | 101000 | 1.5519 |
557
+ | 12.9168 | 101500 | 1.5171 |
558
+ | 12.9804 | 102000 | 1.5249 |
559
+ | 13.0440 | 102500 | 1.4117 |
560
+ | 13.1077 | 103000 | 1.3524 |
561
+ | 13.1713 | 103500 | 1.3564 |
562
+ | 13.2349 | 104000 | 1.3483 |
563
+ | 13.2985 | 104500 | 1.386 |
564
+ | 13.3622 | 105000 | 1.3723 |
565
+ | 13.4258 | 105500 | 1.3933 |
566
+ | 13.4894 | 106000 | 1.3672 |
567
+ | 13.5531 | 106500 | 1.3796 |
568
+ | 13.6167 | 107000 | 1.3637 |
569
+ | 13.6803 | 107500 | 1.4061 |
570
+ | 13.7440 | 108000 | 1.3897 |
571
+ | 13.8076 | 108500 | 1.4342 |
572
+ | 13.8712 | 109000 | 1.3821 |
573
+ | 13.9348 | 109500 | 1.411 |
574
+ | 13.9985 | 110000 | 1.4214 |
575
+ | 14.0621 | 110500 | 1.2551 |
576
+ | 14.1257 | 111000 | 1.2366 |
577
+ | 14.1894 | 111500 | 1.2553 |
578
+ | 14.2530 | 112000 | 1.2553 |
579
+ | 14.3166 | 112500 | 1.2624 |
580
+ | 14.3802 | 113000 | 1.2771 |
581
+ | 14.4439 | 113500 | 1.2744 |
582
+ | 14.5075 | 114000 | 1.2616 |
583
+ | 14.5711 | 114500 | 1.2744 |
584
+ | 14.6348 | 115000 | 1.2705 |
585
+ | 14.6984 | 115500 | 1.3005 |
586
+ | 14.7620 | 116000 | 1.3013 |
587
+ | 14.8257 | 116500 | 1.298 |
588
+ | 14.8893 | 117000 | 1.2972 |
589
+ | 14.9529 | 117500 | 1.277 |
590
+ | 15.0165 | 118000 | 1.2718 |
591
+ | 15.0802 | 118500 | 1.1697 |
592
+ | 15.1438 | 119000 | 1.1819 |
593
+ | 15.2074 | 119500 | 1.1916 |
594
+ | 15.2711 | 120000 | 1.1829 |
595
+ | 15.3347 | 120500 | 1.1632 |
596
+ | 15.3983 | 121000 | 1.1809 |
597
+ | 15.4619 | 121500 | 1.1913 |
598
+ | 15.5256 | 122000 | 1.1916 |
599
+ | 15.5892 | 122500 | 1.1969 |
600
+ | 15.6528 | 123000 | 1.1929 |
601
+ | 15.7165 | 123500 | 1.2086 |
602
+ | 15.7801 | 124000 | 1.1864 |
603
+ | 15.8437 | 124500 | 1.2068 |
604
+ | 15.9074 | 125000 | 1.2253 |
605
+ | 15.9710 | 125500 | 1.1963 |
606
+ | 16.0346 | 126000 | 1.1585 |
607
+ | 16.0982 | 126500 | 1.0834 |
608
+ | 16.1619 | 127000 | 1.0937 |
609
+ | 16.2255 | 127500 | 1.0995 |
610
+ | 16.2891 | 128000 | 1.0787 |
611
+ | 16.3528 | 128500 | 1.1217 |
612
+ | 16.4164 | 129000 | 1.1185 |
613
+ | 16.4800 | 129500 | 1.1203 |
614
+ | 16.5436 | 130000 | 1.1201 |
615
+ | 16.6073 | 130500 | 1.125 |
616
+ | 16.6709 | 131000 | 1.1214 |
617
+ | 16.7345 | 131500 | 1.1228 |
618
+ | 16.7982 | 132000 | 1.1381 |
619
+ | 16.8618 | 132500 | 1.1414 |
620
+ | 16.9254 | 133000 | 1.123 |
621
+ | 16.9891 | 133500 | 1.1003 |
622
+ | 17.0527 | 134000 | 1.0447 |
623
+ | 17.1163 | 134500 | 1.036 |
624
+ | 17.1799 | 135000 | 1.0264 |
625
+ | 17.2436 | 135500 | 1.0375 |
626
+ | 17.3072 | 136000 | 1.0509 |
627
+ | 17.3708 | 136500 | 1.0452 |
628
+ | 17.4345 | 137000 | 1.0519 |
629
+ | 17.4981 | 137500 | 1.0498 |
630
+ | 17.5617 | 138000 | 1.0514 |
631
+ | 17.6253 | 138500 | 1.054 |
632
+ | 17.6890 | 139000 | 1.0457 |
633
+ | 17.7526 | 139500 | 1.0582 |
634
+ | 17.8162 | 140000 | 1.0566 |
635
+ | 17.8799 | 140500 | 1.0644 |
636
+ | 17.9435 | 141000 | 1.0579 |
637
+ | 18.0071 | 141500 | 1.0647 |
638
+ | 18.0708 | 142000 | 0.9704 |
639
+ | 18.1344 | 142500 | 0.9787 |
640
+ | 18.1980 | 143000 | 0.9875 |
641
+ | 18.2616 | 143500 | 0.987 |
642
+ | 18.3253 | 144000 | 0.9834 |
643
+ | 18.3889 | 144500 | 0.999 |
644
+ | 18.4525 | 145000 | 0.9872 |
645
+ | 18.5162 | 145500 | 0.9851 |
646
+ | 18.5798 | 146000 | 0.9986 |
647
+ | 18.6434 | 146500 | 0.9853 |
648
+ | 18.7071 | 147000 | 0.9973 |
649
+ | 18.7707 | 147500 | 0.988 |
650
+ | 18.8343 | 148000 | 0.999 |
651
+ | 18.8979 | 148500 | 0.9899 |
652
+ | 18.9616 | 149000 | 1.0053 |
653
+ | 19.0252 | 149500 | 0.9802 |
654
+ | 19.0888 | 150000 | 0.9301 |
655
+ | 19.1525 | 150500 | 0.9295 |
656
+ | 19.2161 | 151000 | 0.9334 |
657
+ | 19.2797 | 151500 | 0.9503 |
658
+ | 19.3433 | 152000 | 0.9161 |
659
+ | 19.4070 | 152500 | 0.9433 |
660
+ | 19.4706 | 153000 | 0.9376 |
661
+ | 19.5342 | 153500 | 0.9274 |
662
+ | 19.5979 | 154000 | 0.9414 |
663
+ | 19.6615 | 154500 | 0.94 |
664
+ | 19.7251 | 155000 | 0.9344 |
665
+ | 19.7888 | 155500 | 0.9464 |
666
+ | 19.8524 | 156000 | 0.9583 |
667
+ | 19.9160 | 156500 | 0.953 |
668
+ | 19.9796 | 157000 | 0.9481 |
669
+ | 20.0433 | 157500 | 0.8982 |
670
+ | 20.1069 | 158000 | 0.8974 |
671
+ | 20.1705 | 158500 | 0.9022 |
672
+ | 20.2342 | 159000 | 0.8923 |
673
+ | 20.2978 | 159500 | 0.8935 |
674
+ | 20.3614 | 160000 | 0.8917 |
675
+ | 20.4250 | 160500 | 0.9021 |
676
+ | 20.4887 | 161000 | 0.8978 |
677
+ | 20.5523 | 161500 | 0.9078 |
678
+ | 20.6159 | 162000 | 0.903 |
679
+ | 20.6796 | 162500 | 0.8989 |
680
+ | 20.7432 | 163000 | 0.9023 |
681
+ | 20.8068 | 163500 | 0.8918 |
682
+ | 20.8705 | 164000 | 0.8968 |
683
+ | 20.9341 | 164500 | 0.8977 |
684
+ | 20.9977 | 165000 | 0.9035 |
685
+ | 21.0613 | 165500 | 0.8347 |
686
+ | 21.1250 | 166000 | 0.8415 |
687
+ | 21.1886 | 166500 | 0.8472 |
688
+ | 21.2522 | 167000 | 0.8663 |
689
+ | 21.3159 | 167500 | 0.8633 |
690
+ | 21.3795 | 168000 | 0.8569 |
691
+ | 21.4431 | 168500 | 0.8529 |
692
+ | 21.5067 | 169000 | 0.8485 |
693
+ | 21.5704 | 169500 | 0.8759 |
694
+ | 21.6340 | 170000 | 0.8667 |
695
+ | 21.6976 | 170500 | 0.8615 |
696
+ | 21.7613 | 171000 | 0.8623 |
697
+ | 21.8249 | 171500 | 0.8613 |
698
+ | 21.8885 | 172000 | 0.8515 |
699
+ | 21.9522 | 172500 | 0.8615 |
700
+ | 22.0158 | 173000 | 0.8457 |
701
+ | 22.0794 | 173500 | 0.8106 |
702
+ | 22.1430 | 174000 | 0.8109 |
703
+ | 22.2067 | 174500 | 0.8108 |
704
+ | 22.2703 | 175000 | 0.8197 |
705
+ | 22.3339 | 175500 | 0.8165 |
706
+ | 22.3976 | 176000 | 0.8289 |
707
+ | 22.4612 | 176500 | 0.8288 |
708
+ | 22.5248 | 177000 | 0.8145 |
709
+ | 22.5884 | 177500 | 0.8249 |
710
+ | 22.6521 | 178000 | 0.8218 |
711
+ | 22.7157 | 178500 | 0.8284 |
712
+ | 22.7793 | 179000 | 0.833 |
713
+ | 22.8430 | 179500 | 0.8176 |
714
+ | 22.9066 | 180000 | 0.8431 |
715
+ | 22.9702 | 180500 | 0.8234 |
716
+ | 23.0339 | 181000 | 0.7998 |
717
+ | 23.0975 | 181500 | 0.7821 |
718
+ | 23.1611 | 182000 | 0.7914 |
719
+ | 23.2247 | 182500 | 0.7851 |
720
+ | 23.2884 | 183000 | 0.7797 |
721
+ | 23.3520 | 183500 | 0.7931 |
722
+ | 23.4156 | 184000 | 0.7912 |
723
+ | 23.4793 | 184500 | 0.7876 |
724
+ | 23.5429 | 185000 | 0.7954 |
725
+ | 23.6065 | 185500 | 0.7946 |
726
+ | 23.6701 | 186000 | 0.7782 |
727
+ | 23.7338 | 186500 | 0.7952 |
728
+ | 23.7974 | 187000 | 0.8015 |
729
+ | 23.8610 | 187500 | 0.7977 |
730
+ | 23.9247 | 188000 | 0.7875 |
731
+ | 23.9883 | 188500 | 0.7935 |
732
+ | 24.0519 | 189000 | 0.7617 |
733
+ | 24.1156 | 189500 | 0.7625 |
734
+ | 24.1792 | 190000 | 0.7514 |
735
+ | 24.2428 | 190500 | 0.7662 |
736
+ | 24.3064 | 191000 | 0.7692 |
737
+ | 24.3701 | 191500 | 0.7733 |
738
+ | 24.4337 | 192000 | 0.7561 |
739
+ | 24.4973 | 192500 | 0.7577 |
740
+ | 24.5610 | 193000 | 0.7687 |
741
+ | 24.6246 | 193500 | 0.7647 |
742
+ | 24.6882 | 194000 | 0.7717 |
743
+ | 24.7518 | 194500 | 0.761 |
744
+ | 24.8155 | 195000 | 0.7661 |
745
+ | 24.8791 | 195500 | 0.7446 |
746
+ | 24.9427 | 196000 | 0.7659 |
747
+ | 25.0064 | 196500 | 0.7559 |
748
+ | 25.0700 | 197000 | 0.7183 |
749
+ | 25.1336 | 197500 | 0.7399 |
750
+ | 25.1973 | 198000 | 0.7308 |
751
+ | 25.2609 | 198500 | 0.733 |
752
+ | 25.3245 | 199000 | 0.746 |
753
+ | 25.3881 | 199500 | 0.7274 |
754
+ | 25.4518 | 200000 | 0.7358 |
755
+ | 25.5154 | 200500 | 0.7468 |
756
+ | 25.5790 | 201000 | 0.734 |
757
+ | 25.6427 | 201500 | 0.7493 |
758
+ | 25.7063 | 202000 | 0.7263 |
759
+ | 25.7699 | 202500 | 0.7355 |
760
+ | 25.8335 | 203000 | 0.745 |
761
+ | 25.8972 | 203500 | 0.7301 |
762
+ | 25.9608 | 204000 | 0.7457 |
763
+ | 26.0244 | 204500 | 0.7072 |
764
+ | 26.0881 | 205000 | 0.7212 |
765
+ | 26.1517 | 205500 | 0.7186 |
766
+ | 26.2153 | 206000 | 0.7225 |
767
+ | 26.2790 | 206500 | 0.7065 |
768
+ | 26.3426 | 207000 | 0.7153 |
769
+ | 26.4062 | 207500 | 0.72 |
770
+ | 26.4698 | 208000 | 0.7074 |
771
+ | 26.5335 | 208500 | 0.7117 |
772
+ | 26.5971 | 209000 | 0.7206 |
773
+ | 26.6607 | 209500 | 0.7132 |
774
+ | 26.7244 | 210000 | 0.7199 |
775
+ | 26.7880 | 210500 | 0.7102 |
776
+ | 26.8516 | 211000 | 0.7155 |
777
+ | 26.9152 | 211500 | 0.7057 |
778
+ | 26.9789 | 212000 | 0.7191 |
779
+ | 27.0425 | 212500 | 0.6942 |
780
+ | 27.1061 | 213000 | 0.6924 |
781
+ | 27.1698 | 213500 | 0.7025 |
782
+ | 27.2334 | 214000 | 0.6911 |
783
+ | 27.2970 | 214500 | 0.6955 |
784
+ | 27.3607 | 215000 | 0.6875 |
785
+ | 27.4243 | 215500 | 0.698 |
786
+ | 27.4879 | 216000 | 0.7054 |
787
+ | 27.5515 | 216500 | 0.6968 |
788
+ | 27.6152 | 217000 | 0.7044 |
789
+ | 27.6788 | 217500 | 0.6946 |
790
+ | 27.7424 | 218000 | 0.6865 |
791
+ | 27.8061 | 218500 | 0.6974 |
792
+ | 27.8697 | 219000 | 0.698 |
793
+ | 27.9333 | 219500 | 0.6943 |
794
+ | 27.9969 | 220000 | 0.6985 |
795
+ | 28.0606 | 220500 | 0.6785 |
796
+ | 28.1242 | 221000 | 0.6842 |
797
+ | 28.1878 | 221500 | 0.6832 |
798
+ | 28.2515 | 222000 | 0.6863 |
799
+ | 28.3151 | 222500 | 0.6806 |
800
+ | 28.3787 | 223000 | 0.6897 |
801
+ | 28.4424 | 223500 | 0.6975 |
802
+ | 28.5060 | 224000 | 0.6802 |
803
+ | 28.5696 | 224500 | 0.6836 |
804
+ | 28.6332 | 225000 | 0.6849 |
805
+ | 28.6969 | 225500 | 0.6781 |
806
+ | 28.7605 | 226000 | 0.6761 |
807
+ | 28.8241 | 226500 | 0.6762 |
808
+ | 28.8878 | 227000 | 0.6781 |
809
+ | 28.9514 | 227500 | 0.682 |
810
+ | 29.0150 | 228000 | 0.6742 |
811
+ | 29.0786 | 228500 | 0.6595 |
812
+ | 29.1423 | 229000 | 0.683 |
813
+ | 29.2059 | 229500 | 0.6721 |
814
+ | 29.2695 | 230000 | 0.669 |
815
+ | 29.3332 | 230500 | 0.683 |
816
+ | 29.3968 | 231000 | 0.6652 |
817
+ | 29.4604 | 231500 | 0.671 |
818
+ | 29.5241 | 232000 | 0.6662 |
819
+ | 29.5877 | 232500 | 0.6665 |
820
+ | 29.6513 | 233000 | 0.6718 |
821
+ | 29.7149 | 233500 | 0.6657 |
822
+ | 29.7786 | 234000 | 0.6677 |
823
+ | 29.8422 | 234500 | 0.6732 |
824
+ | 29.9058 | 235000 | 0.6687 |
825
+ | 29.9695 | 235500 | 0.6732 |
826
+
827
+ </details>
828
+
829
+ ### Framework Versions
830
+ - Python: 3.11.5
831
+ - Sentence Transformers: 3.4.0
832
+ - Transformers: 4.48.0
833
+ - PyTorch: 2.5.1+cu124
834
+ - Accelerate: 1.2.1
835
+ - Datasets: 2.21.0
836
+ - Tokenizers: 0.21.0
837
+
838
+ ## Citation
839
+
840
+ ### BibTeX
841
+
842
+ #### Sentence Transformers
843
+ ```bibtex
844
+ @inproceedings{reimers-2019-sentence-bert,
845
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
846
+ author = "Reimers, Nils and Gurevych, Iryna",
847
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
848
+ month = "11",
849
+ year = "2019",
850
+ publisher = "Association for Computational Linguistics",
851
+ url = "https://arxiv.org/abs/1908.10084",
852
+ }
853
+ ```
854
+
855
+ #### MarginMSELoss
856
+ ```bibtex
857
+ @misc{hofstätter2021improving,
858
+ title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
859
+ author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
860
+ year={2021},
861
+ eprint={2010.02666},
862
+ archivePrefix={arXiv},
863
+ primaryClass={cs.IR}
864
+ }
865
+ ```
866
+
867
+ <!--
868
+ ## Glossary
869
+
870
+ *Clearly define terms in order to be accessible across audiences.*
871
+ -->
872
+
873
+ <!--
874
+ ## Model Card Authors
875
+
876
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
877
+ -->
878
+
879
+ <!--
880
+ ## Model Card Contact
881
+
882
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
883
+ -->
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "FacebookAI/xlm-roberta-base",
3
+ "architectures": [
4
+ "XLMRobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.48.0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 250002
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.0",
4
+ "transformers": "4.48.0",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fe1e3055498083e90a3f1453646fc544a68188705df6e75f517b52c3cfd690d
3
+ size 1112197096
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0125588937b8740e8240aa78088a3b6f96bd9d26a3da776e5b162b19782ddf4d
3
+ size 2219789306
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfbcefc444f25d5d15fdcaea37d455ee2fd6081b8c6c282dee2f684bb5199b34
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd581da3737aeb8c423dad79b6940d6e6b68723f70a159fc7bf34439f69f3e05
3
+ size 1064
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883b037111086fd4dfebbbc9b7cee11e1517b5e0c0514879478661440f137085
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 512,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "tokenizer_class": "XLMRobertaTokenizer",
54
+ "unk_token": "<unk>"
55
+ }
trainer_state.json ADDED
@@ -0,0 +1,3330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 30.0,
5
+ "eval_steps": 0,
6
+ "global_step": 235740,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.06362942224484602,
13
+ "grad_norm": 1662.35009765625,
14
+ "learning_rate": 4.86e-07,
15
+ "loss": 92.5416,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.12725884448969205,
20
+ "grad_norm": 288.2401428222656,
21
+ "learning_rate": 9.86e-07,
22
+ "loss": 20.6659,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.19088826673453804,
27
+ "grad_norm": 56.13795852661133,
28
+ "learning_rate": 1.4860000000000003e-06,
29
+ "loss": 14.7631,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.2545176889793841,
34
+ "grad_norm": 102.28019714355469,
35
+ "learning_rate": 1.986e-06,
36
+ "loss": 14.3025,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.31814711122423006,
41
+ "grad_norm": 155.62403869628906,
42
+ "learning_rate": 2.486e-06,
43
+ "loss": 13.5257,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 0.3817765334690761,
48
+ "grad_norm": 210.75811767578125,
49
+ "learning_rate": 2.986e-06,
50
+ "loss": 12.8666,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 0.4454059557139221,
55
+ "grad_norm": 256.039306640625,
56
+ "learning_rate": 3.4860000000000006e-06,
57
+ "loss": 12.397,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 0.5090353779587682,
62
+ "grad_norm": 227.79017639160156,
63
+ "learning_rate": 3.9860000000000005e-06,
64
+ "loss": 12.2718,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 0.5726648002036141,
69
+ "grad_norm": 307.928955078125,
70
+ "learning_rate": 4.486000000000001e-06,
71
+ "loss": 11.539,
72
+ "step": 4500
73
+ },
74
+ {
75
+ "epoch": 0.6362942224484601,
76
+ "grad_norm": 199.85580444335938,
77
+ "learning_rate": 4.986e-06,
78
+ "loss": 11.1145,
79
+ "step": 5000
80
+ },
81
+ {
82
+ "epoch": 0.6999236446933061,
83
+ "grad_norm": 236.899169921875,
84
+ "learning_rate": 5.4860000000000005e-06,
85
+ "loss": 11.1232,
86
+ "step": 5500
87
+ },
88
+ {
89
+ "epoch": 0.7635530669381522,
90
+ "grad_norm": 265.123046875,
91
+ "learning_rate": 5.986000000000001e-06,
92
+ "loss": 10.6021,
93
+ "step": 6000
94
+ },
95
+ {
96
+ "epoch": 0.8271824891829982,
97
+ "grad_norm": 254.1043701171875,
98
+ "learning_rate": 6.486e-06,
99
+ "loss": 10.4115,
100
+ "step": 6500
101
+ },
102
+ {
103
+ "epoch": 0.8908119114278442,
104
+ "grad_norm": 172.3489990234375,
105
+ "learning_rate": 6.9860000000000005e-06,
106
+ "loss": 10.4529,
107
+ "step": 7000
108
+ },
109
+ {
110
+ "epoch": 0.9544413336726902,
111
+ "grad_norm": 374.72003173828125,
112
+ "learning_rate": 7.486000000000001e-06,
113
+ "loss": 10.1329,
114
+ "step": 7500
115
+ },
116
+ {
117
+ "epoch": 1.0180707559175364,
118
+ "grad_norm": 320.3682556152344,
119
+ "learning_rate": 7.985e-06,
120
+ "loss": 10.1367,
121
+ "step": 8000
122
+ },
123
+ {
124
+ "epoch": 1.0817001781623823,
125
+ "grad_norm": 297.0594787597656,
126
+ "learning_rate": 8.485000000000001e-06,
127
+ "loss": 9.5914,
128
+ "step": 8500
129
+ },
130
+ {
131
+ "epoch": 1.1453296004072282,
132
+ "grad_norm": 266.2686767578125,
133
+ "learning_rate": 8.985000000000001e-06,
134
+ "loss": 9.2799,
135
+ "step": 9000
136
+ },
137
+ {
138
+ "epoch": 1.2089590226520743,
139
+ "grad_norm": 168.0514373779297,
140
+ "learning_rate": 9.485000000000002e-06,
141
+ "loss": 9.266,
142
+ "step": 9500
143
+ },
144
+ {
145
+ "epoch": 1.2725884448969205,
146
+ "grad_norm": 213.7965545654297,
147
+ "learning_rate": 9.985000000000002e-06,
148
+ "loss": 9.1661,
149
+ "step": 10000
150
+ },
151
+ {
152
+ "epoch": 1.3362178671417664,
153
+ "grad_norm": 189.05682373046875,
154
+ "learning_rate": 9.978515105874015e-06,
155
+ "loss": 8.954,
156
+ "step": 10500
157
+ },
158
+ {
159
+ "epoch": 1.3998472893866123,
160
+ "grad_norm": 230.05084228515625,
161
+ "learning_rate": 9.956365730486402e-06,
162
+ "loss": 8.9562,
163
+ "step": 11000
164
+ },
165
+ {
166
+ "epoch": 1.4634767116314584,
167
+ "grad_norm": 314.4221496582031,
168
+ "learning_rate": 9.934304952600337e-06,
169
+ "loss": 9.4717,
170
+ "step": 11500
171
+ },
172
+ {
173
+ "epoch": 1.5271061338763046,
174
+ "grad_norm": 190.9048614501953,
175
+ "learning_rate": 9.912155577212723e-06,
176
+ "loss": 8.6758,
177
+ "step": 12000
178
+ },
179
+ {
180
+ "epoch": 1.5907355561211505,
181
+ "grad_norm": 3140.1875,
182
+ "learning_rate": 9.89000620182511e-06,
183
+ "loss": 8.87,
184
+ "step": 12500
185
+ },
186
+ {
187
+ "epoch": 1.6543649783659964,
188
+ "grad_norm": 396.64117431640625,
189
+ "learning_rate": 9.867856826437496e-06,
190
+ "loss": 8.5826,
191
+ "step": 13000
192
+ },
193
+ {
194
+ "epoch": 1.7179944006108423,
195
+ "grad_norm": 171.70077514648438,
196
+ "learning_rate": 9.845707451049881e-06,
197
+ "loss": 8.4827,
198
+ "step": 13500
199
+ },
200
+ {
201
+ "epoch": 1.7816238228556884,
202
+ "grad_norm": 269.8551940917969,
203
+ "learning_rate": 9.823558075662267e-06,
204
+ "loss": 8.5306,
205
+ "step": 14000
206
+ },
207
+ {
208
+ "epoch": 1.8452532451005346,
209
+ "grad_norm": 255.013671875,
210
+ "learning_rate": 9.801408700274653e-06,
211
+ "loss": 8.182,
212
+ "step": 14500
213
+ },
214
+ {
215
+ "epoch": 1.9088826673453805,
216
+ "grad_norm": 194.22486877441406,
217
+ "learning_rate": 9.77925932488704e-06,
218
+ "loss": 8.3592,
219
+ "step": 15000
220
+ },
221
+ {
222
+ "epoch": 1.9725120895902264,
223
+ "grad_norm": 149.85800170898438,
224
+ "learning_rate": 9.757109949499426e-06,
225
+ "loss": 8.3879,
226
+ "step": 15500
227
+ },
228
+ {
229
+ "epoch": 2.0361415118350727,
230
+ "grad_norm": 156.6005401611328,
231
+ "learning_rate": 9.735004872862585e-06,
232
+ "loss": 7.4399,
233
+ "step": 16000
234
+ },
235
+ {
236
+ "epoch": 2.0997709340799187,
237
+ "grad_norm": 286.58648681640625,
238
+ "learning_rate": 9.712855497474972e-06,
239
+ "loss": 7.0406,
240
+ "step": 16500
241
+ },
242
+ {
243
+ "epoch": 2.1634003563247646,
244
+ "grad_norm": 242.3479461669922,
245
+ "learning_rate": 9.690706122087358e-06,
246
+ "loss": 6.89,
247
+ "step": 17000
248
+ },
249
+ {
250
+ "epoch": 2.2270297785696105,
251
+ "grad_norm": 180.5225372314453,
252
+ "learning_rate": 9.668556746699744e-06,
253
+ "loss": 6.8651,
254
+ "step": 17500
255
+ },
256
+ {
257
+ "epoch": 2.2906592008144564,
258
+ "grad_norm": 223.84552001953125,
259
+ "learning_rate": 9.64640737131213e-06,
260
+ "loss": 6.8461,
261
+ "step": 18000
262
+ },
263
+ {
264
+ "epoch": 2.3542886230593028,
265
+ "grad_norm": 233.3303680419922,
266
+ "learning_rate": 9.624257995924515e-06,
267
+ "loss": 6.7663,
268
+ "step": 18500
269
+ },
270
+ {
271
+ "epoch": 2.4179180453041487,
272
+ "grad_norm": 237.0810546875,
273
+ "learning_rate": 9.602108620536902e-06,
274
+ "loss": 6.9313,
275
+ "step": 19000
276
+ },
277
+ {
278
+ "epoch": 2.4815474675489946,
279
+ "grad_norm": 176.5728302001953,
280
+ "learning_rate": 9.579959245149288e-06,
281
+ "loss": 6.9688,
282
+ "step": 19500
283
+ },
284
+ {
285
+ "epoch": 2.545176889793841,
286
+ "grad_norm": 184.43077087402344,
287
+ "learning_rate": 9.557809869761674e-06,
288
+ "loss": 6.7821,
289
+ "step": 20000
290
+ },
291
+ {
292
+ "epoch": 2.608806312038687,
293
+ "grad_norm": 182.1748809814453,
294
+ "learning_rate": 9.535660494374059e-06,
295
+ "loss": 6.9468,
296
+ "step": 20500
297
+ },
298
+ {
299
+ "epoch": 2.6724357342835328,
300
+ "grad_norm": 232.06759643554688,
301
+ "learning_rate": 9.51355541773722e-06,
302
+ "loss": 6.731,
303
+ "step": 21000
304
+ },
305
+ {
306
+ "epoch": 2.7360651565283787,
307
+ "grad_norm": 169.12734985351562,
308
+ "learning_rate": 9.491406042349606e-06,
309
+ "loss": 6.649,
310
+ "step": 21500
311
+ },
312
+ {
313
+ "epoch": 2.7996945787732246,
314
+ "grad_norm": 153.9056854248047,
315
+ "learning_rate": 9.469256666961992e-06,
316
+ "loss": 6.7055,
317
+ "step": 22000
318
+ },
319
+ {
320
+ "epoch": 2.8633240010180705,
321
+ "grad_norm": 252.30517578125,
322
+ "learning_rate": 9.447107291574379e-06,
323
+ "loss": 6.7744,
324
+ "step": 22500
325
+ },
326
+ {
327
+ "epoch": 2.926953423262917,
328
+ "grad_norm": 182.51229858398438,
329
+ "learning_rate": 9.424957916186765e-06,
330
+ "loss": 6.9481,
331
+ "step": 23000
332
+ },
333
+ {
334
+ "epoch": 2.9905828455077628,
335
+ "grad_norm": 213.7582244873047,
336
+ "learning_rate": 9.40280854079915e-06,
337
+ "loss": 6.5967,
338
+ "step": 23500
339
+ },
340
+ {
341
+ "epoch": 3.0542122677526087,
342
+ "grad_norm": 187.1132049560547,
343
+ "learning_rate": 9.380659165411536e-06,
344
+ "loss": 5.7351,
345
+ "step": 24000
346
+ },
347
+ {
348
+ "epoch": 3.117841689997455,
349
+ "grad_norm": 157.81378173828125,
350
+ "learning_rate": 9.358509790023921e-06,
351
+ "loss": 5.4125,
352
+ "step": 24500
353
+ },
354
+ {
355
+ "epoch": 3.181471112242301,
356
+ "grad_norm": 448.2672424316406,
357
+ "learning_rate": 9.336360414636309e-06,
358
+ "loss": 5.4095,
359
+ "step": 25000
360
+ },
361
+ {
362
+ "epoch": 3.245100534487147,
363
+ "grad_norm": 170.9069061279297,
364
+ "learning_rate": 9.314211039248694e-06,
365
+ "loss": 5.4253,
366
+ "step": 25500
367
+ },
368
+ {
369
+ "epoch": 3.3087299567319928,
370
+ "grad_norm": 186.37034606933594,
371
+ "learning_rate": 9.29206166386108e-06,
372
+ "loss": 5.3774,
373
+ "step": 26000
374
+ },
375
+ {
376
+ "epoch": 3.3723593789768387,
377
+ "grad_norm": 134.44960021972656,
378
+ "learning_rate": 9.269912288473466e-06,
379
+ "loss": 5.5277,
380
+ "step": 26500
381
+ },
382
+ {
383
+ "epoch": 3.435988801221685,
384
+ "grad_norm": 268.1274108886719,
385
+ "learning_rate": 9.247807211836627e-06,
386
+ "loss": 5.4516,
387
+ "step": 27000
388
+ },
389
+ {
390
+ "epoch": 3.499618223466531,
391
+ "grad_norm": 248.0684814453125,
392
+ "learning_rate": 9.225657836449013e-06,
393
+ "loss": 5.322,
394
+ "step": 27500
395
+ },
396
+ {
397
+ "epoch": 3.563247645711377,
398
+ "grad_norm": 214.72317504882812,
399
+ "learning_rate": 9.203508461061398e-06,
400
+ "loss": 5.5531,
401
+ "step": 28000
402
+ },
403
+ {
404
+ "epoch": 3.626877067956223,
405
+ "grad_norm": 153.9894256591797,
406
+ "learning_rate": 9.181359085673784e-06,
407
+ "loss": 5.5238,
408
+ "step": 28500
409
+ },
410
+ {
411
+ "epoch": 3.690506490201069,
412
+ "grad_norm": 174.88331604003906,
413
+ "learning_rate": 9.159209710286171e-06,
414
+ "loss": 5.5992,
415
+ "step": 29000
416
+ },
417
+ {
418
+ "epoch": 3.754135912445915,
419
+ "grad_norm": 301.410888671875,
420
+ "learning_rate": 9.137104633649332e-06,
421
+ "loss": 5.5351,
422
+ "step": 29500
423
+ },
424
+ {
425
+ "epoch": 3.817765334690761,
426
+ "grad_norm": 201.53282165527344,
427
+ "learning_rate": 9.114955258261718e-06,
428
+ "loss": 5.3985,
429
+ "step": 30000
430
+ },
431
+ {
432
+ "epoch": 3.881394756935607,
433
+ "grad_norm": 212.6214141845703,
434
+ "learning_rate": 9.092805882874104e-06,
435
+ "loss": 5.4313,
436
+ "step": 30500
437
+ },
438
+ {
439
+ "epoch": 3.945024179180453,
440
+ "grad_norm": 177.44863891601562,
441
+ "learning_rate": 9.07065650748649e-06,
442
+ "loss": 5.4173,
443
+ "step": 31000
444
+ },
445
+ {
446
+ "epoch": 4.008653601425299,
447
+ "grad_norm": 160.0504150390625,
448
+ "learning_rate": 9.04855143084965e-06,
449
+ "loss": 5.2333,
450
+ "step": 31500
451
+ },
452
+ {
453
+ "epoch": 4.0722830236701455,
454
+ "grad_norm": 150.31857299804688,
455
+ "learning_rate": 9.026446354212812e-06,
456
+ "loss": 4.3352,
457
+ "step": 32000
458
+ },
459
+ {
460
+ "epoch": 4.135912445914991,
461
+ "grad_norm": 124.97169494628906,
462
+ "learning_rate": 9.004296978825197e-06,
463
+ "loss": 4.3442,
464
+ "step": 32500
465
+ },
466
+ {
467
+ "epoch": 4.199541868159837,
468
+ "grad_norm": 215.25157165527344,
469
+ "learning_rate": 8.982147603437585e-06,
470
+ "loss": 4.3288,
471
+ "step": 33000
472
+ },
473
+ {
474
+ "epoch": 4.263171290404683,
475
+ "grad_norm": 148.4134521484375,
476
+ "learning_rate": 8.95999822804997e-06,
477
+ "loss": 4.367,
478
+ "step": 33500
479
+ },
480
+ {
481
+ "epoch": 4.326800712649529,
482
+ "grad_norm": 204.40850830078125,
483
+ "learning_rate": 8.93789315141313e-06,
484
+ "loss": 4.4607,
485
+ "step": 34000
486
+ },
487
+ {
488
+ "epoch": 4.390430134894375,
489
+ "grad_norm": 164.64273071289062,
490
+ "learning_rate": 8.915743776025517e-06,
491
+ "loss": 4.4461,
492
+ "step": 34500
493
+ },
494
+ {
495
+ "epoch": 4.454059557139221,
496
+ "grad_norm": 204.80953979492188,
497
+ "learning_rate": 8.893594400637903e-06,
498
+ "loss": 4.6218,
499
+ "step": 35000
500
+ },
501
+ {
502
+ "epoch": 4.517688979384067,
503
+ "grad_norm": 185.70278930664062,
504
+ "learning_rate": 8.871445025250289e-06,
505
+ "loss": 4.4249,
506
+ "step": 35500
507
+ },
508
+ {
509
+ "epoch": 4.581318401628913,
510
+ "grad_norm": 202.91989135742188,
511
+ "learning_rate": 8.849295649862674e-06,
512
+ "loss": 4.4129,
513
+ "step": 36000
514
+ },
515
+ {
516
+ "epoch": 4.64494782387376,
517
+ "grad_norm": 164.02198791503906,
518
+ "learning_rate": 8.82714627447506e-06,
519
+ "loss": 4.4065,
520
+ "step": 36500
521
+ },
522
+ {
523
+ "epoch": 4.7085772461186055,
524
+ "grad_norm": 155.7901153564453,
525
+ "learning_rate": 8.804996899087447e-06,
526
+ "loss": 4.5452,
527
+ "step": 37000
528
+ },
529
+ {
530
+ "epoch": 4.772206668363451,
531
+ "grad_norm": 194.26280212402344,
532
+ "learning_rate": 8.782847523699833e-06,
533
+ "loss": 4.5411,
534
+ "step": 37500
535
+ },
536
+ {
537
+ "epoch": 4.835836090608297,
538
+ "grad_norm": 168.18798828125,
539
+ "learning_rate": 8.760698148312218e-06,
540
+ "loss": 4.5423,
541
+ "step": 38000
542
+ },
543
+ {
544
+ "epoch": 4.899465512853143,
545
+ "grad_norm": 136.41905212402344,
546
+ "learning_rate": 8.738548772924604e-06,
547
+ "loss": 4.4942,
548
+ "step": 38500
549
+ },
550
+ {
551
+ "epoch": 4.963094935097989,
552
+ "grad_norm": 141.8522491455078,
553
+ "learning_rate": 8.71639939753699e-06,
554
+ "loss": 4.5332,
555
+ "step": 39000
556
+ },
557
+ {
558
+ "epoch": 5.026724357342835,
559
+ "grad_norm": 149.42271423339844,
560
+ "learning_rate": 8.694250022149377e-06,
561
+ "loss": 4.0759,
562
+ "step": 39500
563
+ },
564
+ {
565
+ "epoch": 5.090353779587681,
566
+ "grad_norm": 139.2994842529297,
567
+ "learning_rate": 8.672100646761763e-06,
568
+ "loss": 3.6274,
569
+ "step": 40000
570
+ },
571
+ {
572
+ "epoch": 5.153983201832528,
573
+ "grad_norm": 140.65269470214844,
574
+ "learning_rate": 8.649951271374148e-06,
575
+ "loss": 3.6795,
576
+ "step": 40500
577
+ },
578
+ {
579
+ "epoch": 5.217612624077374,
580
+ "grad_norm": 139.22752380371094,
581
+ "learning_rate": 8.627801895986534e-06,
582
+ "loss": 3.6741,
583
+ "step": 41000
584
+ },
585
+ {
586
+ "epoch": 5.28124204632222,
587
+ "grad_norm": 93.71381378173828,
588
+ "learning_rate": 8.60565252059892e-06,
589
+ "loss": 3.7396,
590
+ "step": 41500
591
+ },
592
+ {
593
+ "epoch": 5.3448714685670655,
594
+ "grad_norm": 118.81936645507812,
595
+ "learning_rate": 8.583503145211307e-06,
596
+ "loss": 3.6839,
597
+ "step": 42000
598
+ },
599
+ {
600
+ "epoch": 5.408500890811911,
601
+ "grad_norm": 143.53829956054688,
602
+ "learning_rate": 8.561353769823692e-06,
603
+ "loss": 3.732,
604
+ "step": 42500
605
+ },
606
+ {
607
+ "epoch": 5.472130313056757,
608
+ "grad_norm": 152.01527404785156,
609
+ "learning_rate": 8.539248693186852e-06,
610
+ "loss": 3.6557,
611
+ "step": 43000
612
+ },
613
+ {
614
+ "epoch": 5.535759735301603,
615
+ "grad_norm": 159.16392517089844,
616
+ "learning_rate": 8.517143616550015e-06,
617
+ "loss": 3.6925,
618
+ "step": 43500
619
+ },
620
+ {
621
+ "epoch": 5.599389157546449,
622
+ "grad_norm": 143.2123260498047,
623
+ "learning_rate": 8.4949942411624e-06,
624
+ "loss": 3.7149,
625
+ "step": 44000
626
+ },
627
+ {
628
+ "epoch": 5.663018579791295,
629
+ "grad_norm": 136.5101318359375,
630
+ "learning_rate": 8.472844865774786e-06,
631
+ "loss": 3.6744,
632
+ "step": 44500
633
+ },
634
+ {
635
+ "epoch": 5.726648002036142,
636
+ "grad_norm": 156.95541381835938,
637
+ "learning_rate": 8.450695490387172e-06,
638
+ "loss": 3.7669,
639
+ "step": 45000
640
+ },
641
+ {
642
+ "epoch": 5.790277424280988,
643
+ "grad_norm": 137.13330078125,
644
+ "learning_rate": 8.428546114999557e-06,
645
+ "loss": 3.651,
646
+ "step": 45500
647
+ },
648
+ {
649
+ "epoch": 5.853906846525834,
650
+ "grad_norm": 149.19625854492188,
651
+ "learning_rate": 8.406396739611945e-06,
652
+ "loss": 3.721,
653
+ "step": 46000
654
+ },
655
+ {
656
+ "epoch": 5.91753626877068,
657
+ "grad_norm": 193.83432006835938,
658
+ "learning_rate": 8.384291662975104e-06,
659
+ "loss": 3.7012,
660
+ "step": 46500
661
+ },
662
+ {
663
+ "epoch": 5.9811656910155255,
664
+ "grad_norm": 149.3867950439453,
665
+ "learning_rate": 8.362186586338266e-06,
666
+ "loss": 3.7294,
667
+ "step": 47000
668
+ },
669
+ {
670
+ "epoch": 6.0447951132603714,
671
+ "grad_norm": 144.5869140625,
672
+ "learning_rate": 8.340037210950653e-06,
673
+ "loss": 3.2432,
674
+ "step": 47500
675
+ },
676
+ {
677
+ "epoch": 6.108424535505217,
678
+ "grad_norm": 138.15234375,
679
+ "learning_rate": 8.317887835563039e-06,
680
+ "loss": 3.0295,
681
+ "step": 48000
682
+ },
683
+ {
684
+ "epoch": 6.172053957750063,
685
+ "grad_norm": 544.6531372070312,
686
+ "learning_rate": 8.295738460175424e-06,
687
+ "loss": 3.0364,
688
+ "step": 48500
689
+ },
690
+ {
691
+ "epoch": 6.23568337999491,
692
+ "grad_norm": 124.35468292236328,
693
+ "learning_rate": 8.273633383538585e-06,
694
+ "loss": 3.0687,
695
+ "step": 49000
696
+ },
697
+ {
698
+ "epoch": 6.299312802239756,
699
+ "grad_norm": 93.38568878173828,
700
+ "learning_rate": 8.251484008150971e-06,
701
+ "loss": 3.064,
702
+ "step": 49500
703
+ },
704
+ {
705
+ "epoch": 6.362942224484602,
706
+ "grad_norm": 192.03231811523438,
707
+ "learning_rate": 8.229334632763357e-06,
708
+ "loss": 3.112,
709
+ "step": 50000
710
+ },
711
+ {
712
+ "epoch": 6.426571646729448,
713
+ "grad_norm": 107.92765808105469,
714
+ "learning_rate": 8.207185257375742e-06,
715
+ "loss": 3.1438,
716
+ "step": 50500
717
+ },
718
+ {
719
+ "epoch": 6.490201068974294,
720
+ "grad_norm": 124.23885345458984,
721
+ "learning_rate": 8.185080180738904e-06,
722
+ "loss": 3.0733,
723
+ "step": 51000
724
+ },
725
+ {
726
+ "epoch": 6.55383049121914,
727
+ "grad_norm": 154.87612915039062,
728
+ "learning_rate": 8.162930805351291e-06,
729
+ "loss": 3.1719,
730
+ "step": 51500
731
+ },
732
+ {
733
+ "epoch": 6.6174599134639855,
734
+ "grad_norm": 134.2186737060547,
735
+ "learning_rate": 8.140781429963675e-06,
736
+ "loss": 3.1355,
737
+ "step": 52000
738
+ },
739
+ {
740
+ "epoch": 6.6810893357088315,
741
+ "grad_norm": 173.08433532714844,
742
+ "learning_rate": 8.11863205457606e-06,
743
+ "loss": 3.1612,
744
+ "step": 52500
745
+ },
746
+ {
747
+ "epoch": 6.744718757953677,
748
+ "grad_norm": 179.25296020507812,
749
+ "learning_rate": 8.096482679188448e-06,
750
+ "loss": 3.1938,
751
+ "step": 53000
752
+ },
753
+ {
754
+ "epoch": 6.808348180198524,
755
+ "grad_norm": 138.08518981933594,
756
+ "learning_rate": 8.074333303800833e-06,
757
+ "loss": 3.1375,
758
+ "step": 53500
759
+ },
760
+ {
761
+ "epoch": 6.87197760244337,
762
+ "grad_norm": 106.96342468261719,
763
+ "learning_rate": 8.052183928413219e-06,
764
+ "loss": 3.1969,
765
+ "step": 54000
766
+ },
767
+ {
768
+ "epoch": 6.935607024688216,
769
+ "grad_norm": 127.7270278930664,
770
+ "learning_rate": 8.030034553025605e-06,
771
+ "loss": 3.2214,
772
+ "step": 54500
773
+ },
774
+ {
775
+ "epoch": 6.999236446933062,
776
+ "grad_norm": 151.88905334472656,
777
+ "learning_rate": 8.007885177637992e-06,
778
+ "loss": 3.1364,
779
+ "step": 55000
780
+ },
781
+ {
782
+ "epoch": 7.062865869177908,
783
+ "grad_norm": 146.13461303710938,
784
+ "learning_rate": 7.985735802250378e-06,
785
+ "loss": 2.63,
786
+ "step": 55500
787
+ },
788
+ {
789
+ "epoch": 7.126495291422754,
790
+ "grad_norm": 158.6125030517578,
791
+ "learning_rate": 7.963586426862763e-06,
792
+ "loss": 2.5451,
793
+ "step": 56000
794
+ },
795
+ {
796
+ "epoch": 7.1901247136676,
797
+ "grad_norm": 136.17828369140625,
798
+ "learning_rate": 7.941481350225924e-06,
799
+ "loss": 2.644,
800
+ "step": 56500
801
+ },
802
+ {
803
+ "epoch": 7.2537541359124456,
804
+ "grad_norm": 183.11447143554688,
805
+ "learning_rate": 7.91933197483831e-06,
806
+ "loss": 2.6482,
807
+ "step": 57000
808
+ },
809
+ {
810
+ "epoch": 7.317383558157292,
811
+ "grad_norm": 125.30079650878906,
812
+ "learning_rate": 7.897182599450696e-06,
813
+ "loss": 2.6017,
814
+ "step": 57500
815
+ },
816
+ {
817
+ "epoch": 7.381012980402138,
818
+ "grad_norm": 104.10094451904297,
819
+ "learning_rate": 7.875033224063083e-06,
820
+ "loss": 2.6626,
821
+ "step": 58000
822
+ },
823
+ {
824
+ "epoch": 7.444642402646984,
825
+ "grad_norm": 153.14060974121094,
826
+ "learning_rate": 7.852883848675467e-06,
827
+ "loss": 2.6698,
828
+ "step": 58500
829
+ },
830
+ {
831
+ "epoch": 7.50827182489183,
832
+ "grad_norm": 80.38119506835938,
833
+ "learning_rate": 7.830734473287854e-06,
834
+ "loss": 2.6595,
835
+ "step": 59000
836
+ },
837
+ {
838
+ "epoch": 7.571901247136676,
839
+ "grad_norm": 139.31524658203125,
840
+ "learning_rate": 7.80858509790024e-06,
841
+ "loss": 2.6683,
842
+ "step": 59500
843
+ },
844
+ {
845
+ "epoch": 7.635530669381522,
846
+ "grad_norm": 135.78240966796875,
847
+ "learning_rate": 7.786480021263401e-06,
848
+ "loss": 2.7187,
849
+ "step": 60000
850
+ },
851
+ {
852
+ "epoch": 7.699160091626368,
853
+ "grad_norm": 109.59832000732422,
854
+ "learning_rate": 7.764330645875787e-06,
855
+ "loss": 2.6213,
856
+ "step": 60500
857
+ },
858
+ {
859
+ "epoch": 7.762789513871214,
860
+ "grad_norm": 143.305908203125,
861
+ "learning_rate": 7.742181270488172e-06,
862
+ "loss": 2.7119,
863
+ "step": 61000
864
+ },
865
+ {
866
+ "epoch": 7.82641893611606,
867
+ "grad_norm": 147.27064514160156,
868
+ "learning_rate": 7.72003189510056e-06,
869
+ "loss": 2.739,
870
+ "step": 61500
871
+ },
872
+ {
873
+ "epoch": 7.8900483583609065,
874
+ "grad_norm": 109.4032211303711,
875
+ "learning_rate": 7.697882519712945e-06,
876
+ "loss": 2.686,
877
+ "step": 62000
878
+ },
879
+ {
880
+ "epoch": 7.953677780605752,
881
+ "grad_norm": 111.08818054199219,
882
+ "learning_rate": 7.675733144325331e-06,
883
+ "loss": 2.7295,
884
+ "step": 62500
885
+ },
886
+ {
887
+ "epoch": 8.017307202850597,
888
+ "grad_norm": 80.8994369506836,
889
+ "learning_rate": 7.653583768937717e-06,
890
+ "loss": 2.6062,
891
+ "step": 63000
892
+ },
893
+ {
894
+ "epoch": 8.080936625095443,
895
+ "grad_norm": 132.42283630371094,
896
+ "learning_rate": 7.631434393550102e-06,
897
+ "loss": 2.2272,
898
+ "step": 63500
899
+ },
900
+ {
901
+ "epoch": 8.144566047340291,
902
+ "grad_norm": 105.58837127685547,
903
+ "learning_rate": 7.6093293169132635e-06,
904
+ "loss": 2.2692,
905
+ "step": 64000
906
+ },
907
+ {
908
+ "epoch": 8.208195469585137,
909
+ "grad_norm": 165.8797149658203,
910
+ "learning_rate": 7.58717994152565e-06,
911
+ "loss": 2.3135,
912
+ "step": 64500
913
+ },
914
+ {
915
+ "epoch": 8.271824891829983,
916
+ "grad_norm": 103.73261260986328,
917
+ "learning_rate": 7.5650305661380356e-06,
918
+ "loss": 2.2546,
919
+ "step": 65000
920
+ },
921
+ {
922
+ "epoch": 8.335454314074829,
923
+ "grad_norm": 100.5468521118164,
924
+ "learning_rate": 7.542881190750422e-06,
925
+ "loss": 2.2882,
926
+ "step": 65500
927
+ },
928
+ {
929
+ "epoch": 8.399083736319675,
930
+ "grad_norm": 124.30194854736328,
931
+ "learning_rate": 7.520731815362808e-06,
932
+ "loss": 2.2749,
933
+ "step": 66000
934
+ },
935
+ {
936
+ "epoch": 8.46271315856452,
937
+ "grad_norm": 124.07736206054688,
938
+ "learning_rate": 7.498582439975194e-06,
939
+ "loss": 2.363,
940
+ "step": 66500
941
+ },
942
+ {
943
+ "epoch": 8.526342580809366,
944
+ "grad_norm": 110.9386978149414,
945
+ "learning_rate": 7.47643306458758e-06,
946
+ "loss": 2.2923,
947
+ "step": 67000
948
+ },
949
+ {
950
+ "epoch": 8.589972003054212,
951
+ "grad_norm": 129.3117218017578,
952
+ "learning_rate": 7.4542836891999645e-06,
953
+ "loss": 2.3275,
954
+ "step": 67500
955
+ },
956
+ {
957
+ "epoch": 8.653601425299058,
958
+ "grad_norm": 111.8931884765625,
959
+ "learning_rate": 7.432134313812351e-06,
960
+ "loss": 2.3738,
961
+ "step": 68000
962
+ },
963
+ {
964
+ "epoch": 8.717230847543904,
965
+ "grad_norm": 118.7526626586914,
966
+ "learning_rate": 7.409984938424737e-06,
967
+ "loss": 2.3416,
968
+ "step": 68500
969
+ },
970
+ {
971
+ "epoch": 8.78086026978875,
972
+ "grad_norm": 149.440673828125,
973
+ "learning_rate": 7.387835563037123e-06,
974
+ "loss": 2.3851,
975
+ "step": 69000
976
+ },
977
+ {
978
+ "epoch": 8.844489692033596,
979
+ "grad_norm": 122.81755828857422,
980
+ "learning_rate": 7.365730486400284e-06,
981
+ "loss": 2.3356,
982
+ "step": 69500
983
+ },
984
+ {
985
+ "epoch": 8.908119114278442,
986
+ "grad_norm": 132.1360626220703,
987
+ "learning_rate": 7.34358111101267e-06,
988
+ "loss": 2.3598,
989
+ "step": 70000
990
+ },
991
+ {
992
+ "epoch": 8.971748536523288,
993
+ "grad_norm": 125.38104248046875,
994
+ "learning_rate": 7.3214317356250565e-06,
995
+ "loss": 2.4272,
996
+ "step": 70500
997
+ },
998
+ {
999
+ "epoch": 9.035377958768134,
1000
+ "grad_norm": 94.84292602539062,
1001
+ "learning_rate": 7.299326658988217e-06,
1002
+ "loss": 2.141,
1003
+ "step": 71000
1004
+ },
1005
+ {
1006
+ "epoch": 9.09900738101298,
1007
+ "grad_norm": 108.36376190185547,
1008
+ "learning_rate": 7.2771772836006025e-06,
1009
+ "loss": 2.001,
1010
+ "step": 71500
1011
+ },
1012
+ {
1013
+ "epoch": 9.162636803257826,
1014
+ "grad_norm": 120.51274108886719,
1015
+ "learning_rate": 7.255027908212989e-06,
1016
+ "loss": 2.014,
1017
+ "step": 72000
1018
+ },
1019
+ {
1020
+ "epoch": 9.226266225502673,
1021
+ "grad_norm": 76.73661041259766,
1022
+ "learning_rate": 7.232878532825375e-06,
1023
+ "loss": 1.9826,
1024
+ "step": 72500
1025
+ },
1026
+ {
1027
+ "epoch": 9.28989564774752,
1028
+ "grad_norm": 93.48287200927734,
1029
+ "learning_rate": 7.210729157437761e-06,
1030
+ "loss": 1.995,
1031
+ "step": 73000
1032
+ },
1033
+ {
1034
+ "epoch": 9.353525069992365,
1035
+ "grad_norm": 87.56092071533203,
1036
+ "learning_rate": 7.188579782050147e-06,
1037
+ "loss": 2.0097,
1038
+ "step": 73500
1039
+ },
1040
+ {
1041
+ "epoch": 9.417154492237211,
1042
+ "grad_norm": 128.68373107910156,
1043
+ "learning_rate": 7.166430406662532e-06,
1044
+ "loss": 2.0412,
1045
+ "step": 74000
1046
+ },
1047
+ {
1048
+ "epoch": 9.480783914482057,
1049
+ "grad_norm": 101.52668762207031,
1050
+ "learning_rate": 7.144281031274919e-06,
1051
+ "loss": 2.0144,
1052
+ "step": 74500
1053
+ },
1054
+ {
1055
+ "epoch": 9.544413336726903,
1056
+ "grad_norm": 90.50218963623047,
1057
+ "learning_rate": 7.12217595463808e-06,
1058
+ "loss": 2.0653,
1059
+ "step": 75000
1060
+ },
1061
+ {
1062
+ "epoch": 9.608042758971749,
1063
+ "grad_norm": 113.8707046508789,
1064
+ "learning_rate": 7.100026579250465e-06,
1065
+ "loss": 2.022,
1066
+ "step": 75500
1067
+ },
1068
+ {
1069
+ "epoch": 9.671672181216595,
1070
+ "grad_norm": 78.54847717285156,
1071
+ "learning_rate": 7.077921502613627e-06,
1072
+ "loss": 2.0327,
1073
+ "step": 76000
1074
+ },
1075
+ {
1076
+ "epoch": 9.73530160346144,
1077
+ "grad_norm": 131.4427947998047,
1078
+ "learning_rate": 7.055772127226013e-06,
1079
+ "loss": 2.0596,
1080
+ "step": 76500
1081
+ },
1082
+ {
1083
+ "epoch": 9.798931025706286,
1084
+ "grad_norm": 120.61900329589844,
1085
+ "learning_rate": 7.033667050589174e-06,
1086
+ "loss": 2.0761,
1087
+ "step": 77000
1088
+ },
1089
+ {
1090
+ "epoch": 9.862560447951132,
1091
+ "grad_norm": 84.29814147949219,
1092
+ "learning_rate": 7.01151767520156e-06,
1093
+ "loss": 2.1245,
1094
+ "step": 77500
1095
+ },
1096
+ {
1097
+ "epoch": 9.926189870195978,
1098
+ "grad_norm": 91.78532409667969,
1099
+ "learning_rate": 6.989368299813946e-06,
1100
+ "loss": 2.1062,
1101
+ "step": 78000
1102
+ },
1103
+ {
1104
+ "epoch": 9.989819292440824,
1105
+ "grad_norm": 111.85667419433594,
1106
+ "learning_rate": 6.9672189244263324e-06,
1107
+ "loss": 2.1186,
1108
+ "step": 78500
1109
+ },
1110
+ {
1111
+ "epoch": 10.05344871468567,
1112
+ "grad_norm": 97.98519897460938,
1113
+ "learning_rate": 6.945113847789493e-06,
1114
+ "loss": 1.8283,
1115
+ "step": 79000
1116
+ },
1117
+ {
1118
+ "epoch": 10.117078136930516,
1119
+ "grad_norm": 80.28434753417969,
1120
+ "learning_rate": 6.9229644724018785e-06,
1121
+ "loss": 1.7627,
1122
+ "step": 79500
1123
+ },
1124
+ {
1125
+ "epoch": 10.180707559175362,
1126
+ "grad_norm": 99.89539337158203,
1127
+ "learning_rate": 6.900859395765041e-06,
1128
+ "loss": 1.7775,
1129
+ "step": 80000
1130
+ },
1131
+ {
1132
+ "epoch": 10.244336981420208,
1133
+ "grad_norm": 87.49510955810547,
1134
+ "learning_rate": 6.878710020377426e-06,
1135
+ "loss": 1.7865,
1136
+ "step": 80500
1137
+ },
1138
+ {
1139
+ "epoch": 10.307966403665056,
1140
+ "grad_norm": 87.29383850097656,
1141
+ "learning_rate": 6.856560644989811e-06,
1142
+ "loss": 1.8018,
1143
+ "step": 81000
1144
+ },
1145
+ {
1146
+ "epoch": 10.371595825909901,
1147
+ "grad_norm": 88.82074737548828,
1148
+ "learning_rate": 6.834411269602198e-06,
1149
+ "loss": 1.7851,
1150
+ "step": 81500
1151
+ },
1152
+ {
1153
+ "epoch": 10.435225248154747,
1154
+ "grad_norm": 90.42290496826172,
1155
+ "learning_rate": 6.812261894214583e-06,
1156
+ "loss": 1.8085,
1157
+ "step": 82000
1158
+ },
1159
+ {
1160
+ "epoch": 10.498854670399593,
1161
+ "grad_norm": 85.89569091796875,
1162
+ "learning_rate": 6.7901125188269704e-06,
1163
+ "loss": 1.8293,
1164
+ "step": 82500
1165
+ },
1166
+ {
1167
+ "epoch": 10.56248409264444,
1168
+ "grad_norm": 89.20499420166016,
1169
+ "learning_rate": 6.767963143439355e-06,
1170
+ "loss": 1.8549,
1171
+ "step": 83000
1172
+ },
1173
+ {
1174
+ "epoch": 10.626113514889285,
1175
+ "grad_norm": 193.05775451660156,
1176
+ "learning_rate": 6.745813768051741e-06,
1177
+ "loss": 1.8531,
1178
+ "step": 83500
1179
+ },
1180
+ {
1181
+ "epoch": 10.689742937134131,
1182
+ "grad_norm": 106.58789825439453,
1183
+ "learning_rate": 6.723664392664127e-06,
1184
+ "loss": 1.8538,
1185
+ "step": 84000
1186
+ },
1187
+ {
1188
+ "epoch": 10.753372359378977,
1189
+ "grad_norm": 136.8468780517578,
1190
+ "learning_rate": 6.701515017276513e-06,
1191
+ "loss": 1.8814,
1192
+ "step": 84500
1193
+ },
1194
+ {
1195
+ "epoch": 10.817001781623823,
1196
+ "grad_norm": 128.12271118164062,
1197
+ "learning_rate": 6.679365641888899e-06,
1198
+ "loss": 1.8576,
1199
+ "step": 85000
1200
+ },
1201
+ {
1202
+ "epoch": 10.880631203868669,
1203
+ "grad_norm": 70.90370178222656,
1204
+ "learning_rate": 6.657216266501285e-06,
1205
+ "loss": 1.8516,
1206
+ "step": 85500
1207
+ },
1208
+ {
1209
+ "epoch": 10.944260626113515,
1210
+ "grad_norm": 77.27445220947266,
1211
+ "learning_rate": 6.635066891113671e-06,
1212
+ "loss": 1.8555,
1213
+ "step": 86000
1214
+ },
1215
+ {
1216
+ "epoch": 11.00789004835836,
1217
+ "grad_norm": 108.38621520996094,
1218
+ "learning_rate": 6.612917515726057e-06,
1219
+ "loss": 1.8631,
1220
+ "step": 86500
1221
+ },
1222
+ {
1223
+ "epoch": 11.071519470603207,
1224
+ "grad_norm": 145.12940979003906,
1225
+ "learning_rate": 6.590768140338443e-06,
1226
+ "loss": 1.6189,
1227
+ "step": 87000
1228
+ },
1229
+ {
1230
+ "epoch": 11.135148892848052,
1231
+ "grad_norm": 115.5062484741211,
1232
+ "learning_rate": 6.568618764950829e-06,
1233
+ "loss": 1.6143,
1234
+ "step": 87500
1235
+ },
1236
+ {
1237
+ "epoch": 11.198778315092898,
1238
+ "grad_norm": 71.71438598632812,
1239
+ "learning_rate": 6.546469389563215e-06,
1240
+ "loss": 1.6246,
1241
+ "step": 88000
1242
+ },
1243
+ {
1244
+ "epoch": 11.262407737337744,
1245
+ "grad_norm": 89.9764633178711,
1246
+ "learning_rate": 6.5243200141756004e-06,
1247
+ "loss": 1.5997,
1248
+ "step": 88500
1249
+ },
1250
+ {
1251
+ "epoch": 11.32603715958259,
1252
+ "grad_norm": 80.51982879638672,
1253
+ "learning_rate": 6.502170638787987e-06,
1254
+ "loss": 1.646,
1255
+ "step": 89000
1256
+ },
1257
+ {
1258
+ "epoch": 11.389666581827438,
1259
+ "grad_norm": 87.14283752441406,
1260
+ "learning_rate": 6.4800212634003725e-06,
1261
+ "loss": 1.6323,
1262
+ "step": 89500
1263
+ },
1264
+ {
1265
+ "epoch": 11.453296004072284,
1266
+ "grad_norm": 76.05656433105469,
1267
+ "learning_rate": 6.457871888012759e-06,
1268
+ "loss": 1.6623,
1269
+ "step": 90000
1270
+ },
1271
+ {
1272
+ "epoch": 11.51692542631713,
1273
+ "grad_norm": 84.08787536621094,
1274
+ "learning_rate": 6.435722512625145e-06,
1275
+ "loss": 1.6544,
1276
+ "step": 90500
1277
+ },
1278
+ {
1279
+ "epoch": 11.580554848561976,
1280
+ "grad_norm": 113.19395446777344,
1281
+ "learning_rate": 6.413573137237531e-06,
1282
+ "loss": 1.6671,
1283
+ "step": 91000
1284
+ },
1285
+ {
1286
+ "epoch": 11.644184270806822,
1287
+ "grad_norm": 92.68965911865234,
1288
+ "learning_rate": 6.391423761849917e-06,
1289
+ "loss": 1.6742,
1290
+ "step": 91500
1291
+ },
1292
+ {
1293
+ "epoch": 11.707813693051667,
1294
+ "grad_norm": 116.95278930664062,
1295
+ "learning_rate": 6.369274386462302e-06,
1296
+ "loss": 1.6409,
1297
+ "step": 92000
1298
+ },
1299
+ {
1300
+ "epoch": 11.771443115296513,
1301
+ "grad_norm": 77.6058120727539,
1302
+ "learning_rate": 6.347213608576238e-06,
1303
+ "loss": 1.6504,
1304
+ "step": 92500
1305
+ },
1306
+ {
1307
+ "epoch": 11.83507253754136,
1308
+ "grad_norm": 74.96102142333984,
1309
+ "learning_rate": 6.3251085319394e-06,
1310
+ "loss": 1.6791,
1311
+ "step": 93000
1312
+ },
1313
+ {
1314
+ "epoch": 11.898701959786205,
1315
+ "grad_norm": 95.83757781982422,
1316
+ "learning_rate": 6.302959156551785e-06,
1317
+ "loss": 1.6923,
1318
+ "step": 93500
1319
+ },
1320
+ {
1321
+ "epoch": 11.962331382031051,
1322
+ "grad_norm": 114.55757141113281,
1323
+ "learning_rate": 6.280809781164172e-06,
1324
+ "loss": 1.697,
1325
+ "step": 94000
1326
+ },
1327
+ {
1328
+ "epoch": 12.025960804275897,
1329
+ "grad_norm": 59.73118591308594,
1330
+ "learning_rate": 6.258660405776557e-06,
1331
+ "loss": 1.6136,
1332
+ "step": 94500
1333
+ },
1334
+ {
1335
+ "epoch": 12.089590226520743,
1336
+ "grad_norm": 86.23199462890625,
1337
+ "learning_rate": 6.236511030388943e-06,
1338
+ "loss": 1.4437,
1339
+ "step": 95000
1340
+ },
1341
+ {
1342
+ "epoch": 12.153219648765589,
1343
+ "grad_norm": 71.51868438720703,
1344
+ "learning_rate": 6.2143616550013295e-06,
1345
+ "loss": 1.49,
1346
+ "step": 95500
1347
+ },
1348
+ {
1349
+ "epoch": 12.216849071010435,
1350
+ "grad_norm": 96.19779205322266,
1351
+ "learning_rate": 6.192212279613715e-06,
1352
+ "loss": 1.4567,
1353
+ "step": 96000
1354
+ },
1355
+ {
1356
+ "epoch": 12.28047849325528,
1357
+ "grad_norm": 79.43608093261719,
1358
+ "learning_rate": 6.170062904226102e-06,
1359
+ "loss": 1.5007,
1360
+ "step": 96500
1361
+ },
1362
+ {
1363
+ "epoch": 12.344107915500127,
1364
+ "grad_norm": 79.2935791015625,
1365
+ "learning_rate": 6.147913528838487e-06,
1366
+ "loss": 1.4826,
1367
+ "step": 97000
1368
+ },
1369
+ {
1370
+ "epoch": 12.407737337744972,
1371
+ "grad_norm": 144.53054809570312,
1372
+ "learning_rate": 6.125764153450873e-06,
1373
+ "loss": 1.4668,
1374
+ "step": 97500
1375
+ },
1376
+ {
1377
+ "epoch": 12.47136675998982,
1378
+ "grad_norm": 105.31471252441406,
1379
+ "learning_rate": 6.103659076814035e-06,
1380
+ "loss": 1.5009,
1381
+ "step": 98000
1382
+ },
1383
+ {
1384
+ "epoch": 12.534996182234666,
1385
+ "grad_norm": 79.45948028564453,
1386
+ "learning_rate": 6.08150970142642e-06,
1387
+ "loss": 1.5008,
1388
+ "step": 98500
1389
+ },
1390
+ {
1391
+ "epoch": 12.598625604479512,
1392
+ "grad_norm": 100.81867218017578,
1393
+ "learning_rate": 6.059360326038807e-06,
1394
+ "loss": 1.5336,
1395
+ "step": 99000
1396
+ },
1397
+ {
1398
+ "epoch": 12.662255026724358,
1399
+ "grad_norm": 94.66363525390625,
1400
+ "learning_rate": 6.037210950651192e-06,
1401
+ "loss": 1.5057,
1402
+ "step": 99500
1403
+ },
1404
+ {
1405
+ "epoch": 12.725884448969204,
1406
+ "grad_norm": 73.030517578125,
1407
+ "learning_rate": 6.0150615752635775e-06,
1408
+ "loss": 1.5081,
1409
+ "step": 100000
1410
+ },
1411
+ {
1412
+ "epoch": 12.78951387121405,
1413
+ "grad_norm": 67.0549545288086,
1414
+ "learning_rate": 5.99295649862674e-06,
1415
+ "loss": 1.5402,
1416
+ "step": 100500
1417
+ },
1418
+ {
1419
+ "epoch": 12.853143293458896,
1420
+ "grad_norm": 91.37773895263672,
1421
+ "learning_rate": 5.970807123239125e-06,
1422
+ "loss": 1.5519,
1423
+ "step": 101000
1424
+ },
1425
+ {
1426
+ "epoch": 12.916772715703742,
1427
+ "grad_norm": 87.36595153808594,
1428
+ "learning_rate": 5.948657747851511e-06,
1429
+ "loss": 1.5171,
1430
+ "step": 101500
1431
+ },
1432
+ {
1433
+ "epoch": 12.980402137948587,
1434
+ "grad_norm": 82.45221710205078,
1435
+ "learning_rate": 5.926508372463897e-06,
1436
+ "loss": 1.5249,
1437
+ "step": 102000
1438
+ },
1439
+ {
1440
+ "epoch": 13.044031560193433,
1441
+ "grad_norm": 67.87359619140625,
1442
+ "learning_rate": 5.904358997076283e-06,
1443
+ "loss": 1.4117,
1444
+ "step": 102500
1445
+ },
1446
+ {
1447
+ "epoch": 13.10766098243828,
1448
+ "grad_norm": 77.75003814697266,
1449
+ "learning_rate": 5.882209621688669e-06,
1450
+ "loss": 1.3524,
1451
+ "step": 103000
1452
+ },
1453
+ {
1454
+ "epoch": 13.171290404683125,
1455
+ "grad_norm": 103.19142150878906,
1456
+ "learning_rate": 5.860060246301055e-06,
1457
+ "loss": 1.3564,
1458
+ "step": 103500
1459
+ },
1460
+ {
1461
+ "epoch": 13.234919826927971,
1462
+ "grad_norm": 82.8349380493164,
1463
+ "learning_rate": 5.837999468414991e-06,
1464
+ "loss": 1.3483,
1465
+ "step": 104000
1466
+ },
1467
+ {
1468
+ "epoch": 13.298549249172817,
1469
+ "grad_norm": 83.94813537597656,
1470
+ "learning_rate": 5.815850093027378e-06,
1471
+ "loss": 1.386,
1472
+ "step": 104500
1473
+ },
1474
+ {
1475
+ "epoch": 13.362178671417663,
1476
+ "grad_norm": 80.00110626220703,
1477
+ "learning_rate": 5.793700717639763e-06,
1478
+ "loss": 1.3723,
1479
+ "step": 105000
1480
+ },
1481
+ {
1482
+ "epoch": 13.425808093662509,
1483
+ "grad_norm": 79.54706573486328,
1484
+ "learning_rate": 5.771551342252149e-06,
1485
+ "loss": 1.3933,
1486
+ "step": 105500
1487
+ },
1488
+ {
1489
+ "epoch": 13.489437515907355,
1490
+ "grad_norm": 118.33966827392578,
1491
+ "learning_rate": 5.749401966864535e-06,
1492
+ "loss": 1.3672,
1493
+ "step": 106000
1494
+ },
1495
+ {
1496
+ "epoch": 13.553066938152202,
1497
+ "grad_norm": 148.68141174316406,
1498
+ "learning_rate": 5.727252591476921e-06,
1499
+ "loss": 1.3796,
1500
+ "step": 106500
1501
+ },
1502
+ {
1503
+ "epoch": 13.616696360397048,
1504
+ "grad_norm": 81.23079681396484,
1505
+ "learning_rate": 5.705103216089307e-06,
1506
+ "loss": 1.3637,
1507
+ "step": 107000
1508
+ },
1509
+ {
1510
+ "epoch": 13.680325782641894,
1511
+ "grad_norm": 118.37026977539062,
1512
+ "learning_rate": 5.682953840701693e-06,
1513
+ "loss": 1.4061,
1514
+ "step": 107500
1515
+ },
1516
+ {
1517
+ "epoch": 13.74395520488674,
1518
+ "grad_norm": 87.67139434814453,
1519
+ "learning_rate": 5.660804465314078e-06,
1520
+ "loss": 1.3897,
1521
+ "step": 108000
1522
+ },
1523
+ {
1524
+ "epoch": 13.807584627131586,
1525
+ "grad_norm": 76.84065246582031,
1526
+ "learning_rate": 5.638655089926465e-06,
1527
+ "loss": 1.4342,
1528
+ "step": 108500
1529
+ },
1530
+ {
1531
+ "epoch": 13.871214049376432,
1532
+ "grad_norm": 83.0779037475586,
1533
+ "learning_rate": 5.61650571453885e-06,
1534
+ "loss": 1.3821,
1535
+ "step": 109000
1536
+ },
1537
+ {
1538
+ "epoch": 13.934843471621278,
1539
+ "grad_norm": 63.323001861572266,
1540
+ "learning_rate": 5.594400637902012e-06,
1541
+ "loss": 1.411,
1542
+ "step": 109500
1543
+ },
1544
+ {
1545
+ "epoch": 13.998472893866124,
1546
+ "grad_norm": 75.757080078125,
1547
+ "learning_rate": 5.572295561265173e-06,
1548
+ "loss": 1.4214,
1549
+ "step": 110000
1550
+ },
1551
+ {
1552
+ "epoch": 14.06210231611097,
1553
+ "grad_norm": 47.76633071899414,
1554
+ "learning_rate": 5.550146185877559e-06,
1555
+ "loss": 1.2551,
1556
+ "step": 110500
1557
+ },
1558
+ {
1559
+ "epoch": 14.125731738355816,
1560
+ "grad_norm": 67.52932739257812,
1561
+ "learning_rate": 5.528041109240719e-06,
1562
+ "loss": 1.2366,
1563
+ "step": 111000
1564
+ },
1565
+ {
1566
+ "epoch": 14.189361160600662,
1567
+ "grad_norm": 77.91776275634766,
1568
+ "learning_rate": 5.505891733853106e-06,
1569
+ "loss": 1.2553,
1570
+ "step": 111500
1571
+ },
1572
+ {
1573
+ "epoch": 14.252990582845507,
1574
+ "grad_norm": 74.56119537353516,
1575
+ "learning_rate": 5.4837423584654914e-06,
1576
+ "loss": 1.2553,
1577
+ "step": 112000
1578
+ },
1579
+ {
1580
+ "epoch": 14.316620005090353,
1581
+ "grad_norm": 70.80554962158203,
1582
+ "learning_rate": 5.461592983077878e-06,
1583
+ "loss": 1.2624,
1584
+ "step": 112500
1585
+ },
1586
+ {
1587
+ "epoch": 14.3802494273352,
1588
+ "grad_norm": 72.7087631225586,
1589
+ "learning_rate": 5.4394436076902635e-06,
1590
+ "loss": 1.2771,
1591
+ "step": 113000
1592
+ },
1593
+ {
1594
+ "epoch": 14.443878849580045,
1595
+ "grad_norm": 81.98471069335938,
1596
+ "learning_rate": 5.41729423230265e-06,
1597
+ "loss": 1.2744,
1598
+ "step": 113500
1599
+ },
1600
+ {
1601
+ "epoch": 14.507508271824891,
1602
+ "grad_norm": 71.72978973388672,
1603
+ "learning_rate": 5.395189155665811e-06,
1604
+ "loss": 1.2616,
1605
+ "step": 114000
1606
+ },
1607
+ {
1608
+ "epoch": 14.571137694069737,
1609
+ "grad_norm": 73.07415771484375,
1610
+ "learning_rate": 5.373039780278196e-06,
1611
+ "loss": 1.2744,
1612
+ "step": 114500
1613
+ },
1614
+ {
1615
+ "epoch": 14.634767116314585,
1616
+ "grad_norm": 46.78715133666992,
1617
+ "learning_rate": 5.350890404890583e-06,
1618
+ "loss": 1.2705,
1619
+ "step": 115000
1620
+ },
1621
+ {
1622
+ "epoch": 14.69839653855943,
1623
+ "grad_norm": 80.48126220703125,
1624
+ "learning_rate": 5.328741029502968e-06,
1625
+ "loss": 1.3005,
1626
+ "step": 115500
1627
+ },
1628
+ {
1629
+ "epoch": 14.762025960804277,
1630
+ "grad_norm": 78.11446380615234,
1631
+ "learning_rate": 5.306591654115354e-06,
1632
+ "loss": 1.3013,
1633
+ "step": 116000
1634
+ },
1635
+ {
1636
+ "epoch": 14.825655383049122,
1637
+ "grad_norm": 113.7435302734375,
1638
+ "learning_rate": 5.28444227872774e-06,
1639
+ "loss": 1.298,
1640
+ "step": 116500
1641
+ },
1642
+ {
1643
+ "epoch": 14.889284805293968,
1644
+ "grad_norm": 58.536346435546875,
1645
+ "learning_rate": 5.262292903340126e-06,
1646
+ "loss": 1.2972,
1647
+ "step": 117000
1648
+ },
1649
+ {
1650
+ "epoch": 14.952914227538814,
1651
+ "grad_norm": 85.87594604492188,
1652
+ "learning_rate": 5.240143527952512e-06,
1653
+ "loss": 1.277,
1654
+ "step": 117500
1655
+ },
1656
+ {
1657
+ "epoch": 15.01654364978366,
1658
+ "grad_norm": 61.39375305175781,
1659
+ "learning_rate": 5.217994152564898e-06,
1660
+ "loss": 1.2718,
1661
+ "step": 118000
1662
+ },
1663
+ {
1664
+ "epoch": 15.080173072028506,
1665
+ "grad_norm": 64.70631408691406,
1666
+ "learning_rate": 5.1958447771772836e-06,
1667
+ "loss": 1.1697,
1668
+ "step": 118500
1669
+ },
1670
+ {
1671
+ "epoch": 15.143802494273352,
1672
+ "grad_norm": 81.51799774169922,
1673
+ "learning_rate": 5.17369540178967e-06,
1674
+ "loss": 1.1819,
1675
+ "step": 119000
1676
+ },
1677
+ {
1678
+ "epoch": 15.207431916518198,
1679
+ "grad_norm": 81.38251495361328,
1680
+ "learning_rate": 5.151546026402056e-06,
1681
+ "loss": 1.1916,
1682
+ "step": 119500
1683
+ },
1684
+ {
1685
+ "epoch": 15.271061338763044,
1686
+ "grad_norm": 87.31340789794922,
1687
+ "learning_rate": 5.129396651014442e-06,
1688
+ "loss": 1.1829,
1689
+ "step": 120000
1690
+ },
1691
+ {
1692
+ "epoch": 15.33469076100789,
1693
+ "grad_norm": 67.25629425048828,
1694
+ "learning_rate": 5.107247275626828e-06,
1695
+ "loss": 1.1632,
1696
+ "step": 120500
1697
+ },
1698
+ {
1699
+ "epoch": 15.398320183252736,
1700
+ "grad_norm": 56.04712677001953,
1701
+ "learning_rate": 5.085097900239213e-06,
1702
+ "loss": 1.1809,
1703
+ "step": 121000
1704
+ },
1705
+ {
1706
+ "epoch": 15.461949605497582,
1707
+ "grad_norm": 66.33815002441406,
1708
+ "learning_rate": 5.0629928236023755e-06,
1709
+ "loss": 1.1913,
1710
+ "step": 121500
1711
+ },
1712
+ {
1713
+ "epoch": 15.525579027742427,
1714
+ "grad_norm": 69.98699951171875,
1715
+ "learning_rate": 5.04084344821476e-06,
1716
+ "loss": 1.1916,
1717
+ "step": 122000
1718
+ },
1719
+ {
1720
+ "epoch": 15.589208449987273,
1721
+ "grad_norm": 70.65410614013672,
1722
+ "learning_rate": 5.018694072827147e-06,
1723
+ "loss": 1.1969,
1724
+ "step": 122500
1725
+ },
1726
+ {
1727
+ "epoch": 15.65283787223212,
1728
+ "grad_norm": 68.66796875,
1729
+ "learning_rate": 4.996544697439532e-06,
1730
+ "loss": 1.1929,
1731
+ "step": 123000
1732
+ },
1733
+ {
1734
+ "epoch": 15.716467294476967,
1735
+ "grad_norm": 68.35984802246094,
1736
+ "learning_rate": 4.974439620802694e-06,
1737
+ "loss": 1.2086,
1738
+ "step": 123500
1739
+ },
1740
+ {
1741
+ "epoch": 15.780096716721813,
1742
+ "grad_norm": 64.63552856445312,
1743
+ "learning_rate": 4.952290245415079e-06,
1744
+ "loss": 1.1864,
1745
+ "step": 124000
1746
+ },
1747
+ {
1748
+ "epoch": 15.843726138966659,
1749
+ "grad_norm": 59.172645568847656,
1750
+ "learning_rate": 4.930140870027466e-06,
1751
+ "loss": 1.2068,
1752
+ "step": 124500
1753
+ },
1754
+ {
1755
+ "epoch": 15.907355561211505,
1756
+ "grad_norm": 64.2562255859375,
1757
+ "learning_rate": 4.907991494639851e-06,
1758
+ "loss": 1.2253,
1759
+ "step": 125000
1760
+ },
1761
+ {
1762
+ "epoch": 15.97098498345635,
1763
+ "grad_norm": 61.80392837524414,
1764
+ "learning_rate": 4.885842119252238e-06,
1765
+ "loss": 1.1963,
1766
+ "step": 125500
1767
+ },
1768
+ {
1769
+ "epoch": 16.034614405701195,
1770
+ "grad_norm": 86.79552459716797,
1771
+ "learning_rate": 4.8636927438646234e-06,
1772
+ "loss": 1.1585,
1773
+ "step": 126000
1774
+ },
1775
+ {
1776
+ "epoch": 16.098243827946042,
1777
+ "grad_norm": 81.79373931884766,
1778
+ "learning_rate": 4.841543368477009e-06,
1779
+ "loss": 1.0834,
1780
+ "step": 126500
1781
+ },
1782
+ {
1783
+ "epoch": 16.161873250190887,
1784
+ "grad_norm": 60.24835205078125,
1785
+ "learning_rate": 4.8193939930893955e-06,
1786
+ "loss": 1.0937,
1787
+ "step": 127000
1788
+ },
1789
+ {
1790
+ "epoch": 16.225502672435734,
1791
+ "grad_norm": 74.93160247802734,
1792
+ "learning_rate": 4.797244617701781e-06,
1793
+ "loss": 1.0995,
1794
+ "step": 127500
1795
+ },
1796
+ {
1797
+ "epoch": 16.289132094680582,
1798
+ "grad_norm": 75.08971405029297,
1799
+ "learning_rate": 4.775095242314168e-06,
1800
+ "loss": 1.0787,
1801
+ "step": 128000
1802
+ },
1803
+ {
1804
+ "epoch": 16.352761516925426,
1805
+ "grad_norm": 66.41687774658203,
1806
+ "learning_rate": 4.752990165677328e-06,
1807
+ "loss": 1.1217,
1808
+ "step": 128500
1809
+ },
1810
+ {
1811
+ "epoch": 16.416390939170274,
1812
+ "grad_norm": 68.62983703613281,
1813
+ "learning_rate": 4.730840790289714e-06,
1814
+ "loss": 1.1185,
1815
+ "step": 129000
1816
+ },
1817
+ {
1818
+ "epoch": 16.480020361415118,
1819
+ "grad_norm": 67.19387817382812,
1820
+ "learning_rate": 4.7086914149021e-06,
1821
+ "loss": 1.1203,
1822
+ "step": 129500
1823
+ },
1824
+ {
1825
+ "epoch": 16.543649783659966,
1826
+ "grad_norm": 84.18933868408203,
1827
+ "learning_rate": 4.686542039514486e-06,
1828
+ "loss": 1.1201,
1829
+ "step": 130000
1830
+ },
1831
+ {
1832
+ "epoch": 16.60727920590481,
1833
+ "grad_norm": 56.41159439086914,
1834
+ "learning_rate": 4.664392664126872e-06,
1835
+ "loss": 1.125,
1836
+ "step": 130500
1837
+ },
1838
+ {
1839
+ "epoch": 16.670908628149657,
1840
+ "grad_norm": 90.81446075439453,
1841
+ "learning_rate": 4.642376184991584e-06,
1842
+ "loss": 1.1214,
1843
+ "step": 131000
1844
+ },
1845
+ {
1846
+ "epoch": 16.7345380503945,
1847
+ "grad_norm": 66.80113220214844,
1848
+ "learning_rate": 4.62022680960397e-06,
1849
+ "loss": 1.1228,
1850
+ "step": 131500
1851
+ },
1852
+ {
1853
+ "epoch": 16.79816747263935,
1854
+ "grad_norm": 68.8161849975586,
1855
+ "learning_rate": 4.598077434216355e-06,
1856
+ "loss": 1.1381,
1857
+ "step": 132000
1858
+ },
1859
+ {
1860
+ "epoch": 16.861796894884193,
1861
+ "grad_norm": 80.11527252197266,
1862
+ "learning_rate": 4.575928058828742e-06,
1863
+ "loss": 1.1414,
1864
+ "step": 132500
1865
+ },
1866
+ {
1867
+ "epoch": 16.92542631712904,
1868
+ "grad_norm": 64.7822036743164,
1869
+ "learning_rate": 4.553778683441127e-06,
1870
+ "loss": 1.123,
1871
+ "step": 133000
1872
+ },
1873
+ {
1874
+ "epoch": 16.989055739373885,
1875
+ "grad_norm": 65.32453918457031,
1876
+ "learning_rate": 4.531629308053514e-06,
1877
+ "loss": 1.1003,
1878
+ "step": 133500
1879
+ },
1880
+ {
1881
+ "epoch": 17.052685161618733,
1882
+ "grad_norm": 91.62205505371094,
1883
+ "learning_rate": 4.5094799326658994e-06,
1884
+ "loss": 1.0447,
1885
+ "step": 134000
1886
+ },
1887
+ {
1888
+ "epoch": 17.116314583863577,
1889
+ "grad_norm": 49.810699462890625,
1890
+ "learning_rate": 4.487330557278285e-06,
1891
+ "loss": 1.036,
1892
+ "step": 134500
1893
+ },
1894
+ {
1895
+ "epoch": 17.179944006108425,
1896
+ "grad_norm": 64.78093719482422,
1897
+ "learning_rate": 4.465181181890671e-06,
1898
+ "loss": 1.0264,
1899
+ "step": 135000
1900
+ },
1901
+ {
1902
+ "epoch": 17.24357342835327,
1903
+ "grad_norm": 61.86587905883789,
1904
+ "learning_rate": 4.443031806503057e-06,
1905
+ "loss": 1.0375,
1906
+ "step": 135500
1907
+ },
1908
+ {
1909
+ "epoch": 17.307202850598117,
1910
+ "grad_norm": 57.83167266845703,
1911
+ "learning_rate": 4.420882431115443e-06,
1912
+ "loss": 1.0509,
1913
+ "step": 136000
1914
+ },
1915
+ {
1916
+ "epoch": 17.370832272842964,
1917
+ "grad_norm": 81.06472778320312,
1918
+ "learning_rate": 4.398733055727829e-06,
1919
+ "loss": 1.0452,
1920
+ "step": 136500
1921
+ },
1922
+ {
1923
+ "epoch": 17.43446169508781,
1924
+ "grad_norm": 77.34078979492188,
1925
+ "learning_rate": 4.376583680340215e-06,
1926
+ "loss": 1.0519,
1927
+ "step": 137000
1928
+ },
1929
+ {
1930
+ "epoch": 17.498091117332656,
1931
+ "grad_norm": 75.93341064453125,
1932
+ "learning_rate": 4.3544343049526005e-06,
1933
+ "loss": 1.0498,
1934
+ "step": 137500
1935
+ },
1936
+ {
1937
+ "epoch": 17.5617205395775,
1938
+ "grad_norm": 56.93536376953125,
1939
+ "learning_rate": 4.332284929564987e-06,
1940
+ "loss": 1.0514,
1941
+ "step": 138000
1942
+ },
1943
+ {
1944
+ "epoch": 17.625349961822348,
1945
+ "grad_norm": 63.56499481201172,
1946
+ "learning_rate": 4.310179852928148e-06,
1947
+ "loss": 1.054,
1948
+ "step": 138500
1949
+ },
1950
+ {
1951
+ "epoch": 17.688979384067192,
1952
+ "grad_norm": 71.97218322753906,
1953
+ "learning_rate": 4.288030477540534e-06,
1954
+ "loss": 1.0457,
1955
+ "step": 139000
1956
+ },
1957
+ {
1958
+ "epoch": 17.75260880631204,
1959
+ "grad_norm": 63.93644332885742,
1960
+ "learning_rate": 4.265925400903695e-06,
1961
+ "loss": 1.0582,
1962
+ "step": 139500
1963
+ },
1964
+ {
1965
+ "epoch": 17.816238228556884,
1966
+ "grad_norm": 60.03602981567383,
1967
+ "learning_rate": 4.243776025516081e-06,
1968
+ "loss": 1.0566,
1969
+ "step": 140000
1970
+ },
1971
+ {
1972
+ "epoch": 17.87986765080173,
1973
+ "grad_norm": 66.63105010986328,
1974
+ "learning_rate": 4.221626650128467e-06,
1975
+ "loss": 1.0644,
1976
+ "step": 140500
1977
+ },
1978
+ {
1979
+ "epoch": 17.943497073046576,
1980
+ "grad_norm": 66.42560577392578,
1981
+ "learning_rate": 4.199477274740853e-06,
1982
+ "loss": 1.0579,
1983
+ "step": 141000
1984
+ },
1985
+ {
1986
+ "epoch": 18.007126495291423,
1987
+ "grad_norm": 58.22215270996094,
1988
+ "learning_rate": 4.1773278993532385e-06,
1989
+ "loss": 1.0647,
1990
+ "step": 141500
1991
+ },
1992
+ {
1993
+ "epoch": 18.070755917536268,
1994
+ "grad_norm": 63.40778350830078,
1995
+ "learning_rate": 4.155178523965624e-06,
1996
+ "loss": 0.9704,
1997
+ "step": 142000
1998
+ },
1999
+ {
2000
+ "epoch": 18.134385339781115,
2001
+ "grad_norm": 67.59272003173828,
2002
+ "learning_rate": 4.1330291485780105e-06,
2003
+ "loss": 0.9787,
2004
+ "step": 142500
2005
+ },
2006
+ {
2007
+ "epoch": 18.19801476202596,
2008
+ "grad_norm": 58.86367416381836,
2009
+ "learning_rate": 4.110879773190396e-06,
2010
+ "loss": 0.9875,
2011
+ "step": 143000
2012
+ },
2013
+ {
2014
+ "epoch": 18.261644184270807,
2015
+ "grad_norm": 72.68678283691406,
2016
+ "learning_rate": 4.088730397802782e-06,
2017
+ "loss": 0.987,
2018
+ "step": 143500
2019
+ },
2020
+ {
2021
+ "epoch": 18.32527360651565,
2022
+ "grad_norm": 69.95580291748047,
2023
+ "learning_rate": 4.066581022415168e-06,
2024
+ "loss": 0.9834,
2025
+ "step": 144000
2026
+ },
2027
+ {
2028
+ "epoch": 18.3889030287605,
2029
+ "grad_norm": 63.809104919433594,
2030
+ "learning_rate": 4.044431647027554e-06,
2031
+ "loss": 0.999,
2032
+ "step": 144500
2033
+ },
2034
+ {
2035
+ "epoch": 18.452532451005347,
2036
+ "grad_norm": 76.64576721191406,
2037
+ "learning_rate": 4.02228227163994e-06,
2038
+ "loss": 0.9872,
2039
+ "step": 145000
2040
+ },
2041
+ {
2042
+ "epoch": 18.51616187325019,
2043
+ "grad_norm": 54.77001953125,
2044
+ "learning_rate": 4.000177195003101e-06,
2045
+ "loss": 0.9851,
2046
+ "step": 145500
2047
+ },
2048
+ {
2049
+ "epoch": 18.57979129549504,
2050
+ "grad_norm": 67.22696685791016,
2051
+ "learning_rate": 3.978027819615487e-06,
2052
+ "loss": 0.9986,
2053
+ "step": 146000
2054
+ },
2055
+ {
2056
+ "epoch": 18.643420717739883,
2057
+ "grad_norm": 69.88746643066406,
2058
+ "learning_rate": 3.955878444227873e-06,
2059
+ "loss": 0.9853,
2060
+ "step": 146500
2061
+ },
2062
+ {
2063
+ "epoch": 18.70705013998473,
2064
+ "grad_norm": 66.42214965820312,
2065
+ "learning_rate": 3.933729068840259e-06,
2066
+ "loss": 0.9973,
2067
+ "step": 147000
2068
+ },
2069
+ {
2070
+ "epoch": 18.770679562229574,
2071
+ "grad_norm": 75.5511245727539,
2072
+ "learning_rate": 3.9116682909541955e-06,
2073
+ "loss": 0.988,
2074
+ "step": 147500
2075
+ },
2076
+ {
2077
+ "epoch": 18.834308984474422,
2078
+ "grad_norm": 73.2605209350586,
2079
+ "learning_rate": 3.889518915566581e-06,
2080
+ "loss": 0.999,
2081
+ "step": 148000
2082
+ },
2083
+ {
2084
+ "epoch": 18.897938406719266,
2085
+ "grad_norm": 63.08274841308594,
2086
+ "learning_rate": 3.8673695401789675e-06,
2087
+ "loss": 0.9899,
2088
+ "step": 148500
2089
+ },
2090
+ {
2091
+ "epoch": 18.961567828964114,
2092
+ "grad_norm": 98.51166534423828,
2093
+ "learning_rate": 3.845220164791353e-06,
2094
+ "loss": 1.0053,
2095
+ "step": 149000
2096
+ },
2097
+ {
2098
+ "epoch": 19.025197251208958,
2099
+ "grad_norm": 67.6368408203125,
2100
+ "learning_rate": 3.823070789403739e-06,
2101
+ "loss": 0.9802,
2102
+ "step": 149500
2103
+ },
2104
+ {
2105
+ "epoch": 19.088826673453806,
2106
+ "grad_norm": 71.1702880859375,
2107
+ "learning_rate": 3.800921414016125e-06,
2108
+ "loss": 0.9301,
2109
+ "step": 150000
2110
+ },
2111
+ {
2112
+ "epoch": 19.15245609569865,
2113
+ "grad_norm": 74.88888549804688,
2114
+ "learning_rate": 3.778772038628511e-06,
2115
+ "loss": 0.9295,
2116
+ "step": 150500
2117
+ },
2118
+ {
2119
+ "epoch": 19.216085517943498,
2120
+ "grad_norm": 49.797691345214844,
2121
+ "learning_rate": 3.756622663240897e-06,
2122
+ "loss": 0.9334,
2123
+ "step": 151000
2124
+ },
2125
+ {
2126
+ "epoch": 19.27971494018834,
2127
+ "grad_norm": 49.12934875488281,
2128
+ "learning_rate": 3.734473287853283e-06,
2129
+ "loss": 0.9503,
2130
+ "step": 151500
2131
+ },
2132
+ {
2133
+ "epoch": 19.34334436243319,
2134
+ "grad_norm": 47.530452728271484,
2135
+ "learning_rate": 3.712323912465669e-06,
2136
+ "loss": 0.9161,
2137
+ "step": 152000
2138
+ },
2139
+ {
2140
+ "epoch": 19.406973784678033,
2141
+ "grad_norm": 69.1083984375,
2142
+ "learning_rate": 3.6901745370780546e-06,
2143
+ "loss": 0.9433,
2144
+ "step": 152500
2145
+ },
2146
+ {
2147
+ "epoch": 19.47060320692288,
2148
+ "grad_norm": 62.554718017578125,
2149
+ "learning_rate": 3.6680251616904407e-06,
2150
+ "loss": 0.9376,
2151
+ "step": 153000
2152
+ },
2153
+ {
2154
+ "epoch": 19.53423262916773,
2155
+ "grad_norm": 55.151100158691406,
2156
+ "learning_rate": 3.645920085053602e-06,
2157
+ "loss": 0.9274,
2158
+ "step": 153500
2159
+ },
2160
+ {
2161
+ "epoch": 19.597862051412573,
2162
+ "grad_norm": 60.6050910949707,
2163
+ "learning_rate": 3.623770709665988e-06,
2164
+ "loss": 0.9414,
2165
+ "step": 154000
2166
+ },
2167
+ {
2168
+ "epoch": 19.66149147365742,
2169
+ "grad_norm": 55.62131118774414,
2170
+ "learning_rate": 3.6016213342783736e-06,
2171
+ "loss": 0.94,
2172
+ "step": 154500
2173
+ },
2174
+ {
2175
+ "epoch": 19.725120895902265,
2176
+ "grad_norm": 59.69659423828125,
2177
+ "learning_rate": 3.5794719588907597e-06,
2178
+ "loss": 0.9344,
2179
+ "step": 155000
2180
+ },
2181
+ {
2182
+ "epoch": 19.788750318147112,
2183
+ "grad_norm": 46.444984436035156,
2184
+ "learning_rate": 3.557366882253921e-06,
2185
+ "loss": 0.9464,
2186
+ "step": 155500
2187
+ },
2188
+ {
2189
+ "epoch": 19.852379740391957,
2190
+ "grad_norm": 63.4849739074707,
2191
+ "learning_rate": 3.535217506866307e-06,
2192
+ "loss": 0.9583,
2193
+ "step": 156000
2194
+ },
2195
+ {
2196
+ "epoch": 19.916009162636804,
2197
+ "grad_norm": 69.28148651123047,
2198
+ "learning_rate": 3.5130681314786926e-06,
2199
+ "loss": 0.953,
2200
+ "step": 156500
2201
+ },
2202
+ {
2203
+ "epoch": 19.97963858488165,
2204
+ "grad_norm": 127.65802764892578,
2205
+ "learning_rate": 3.4909187560910782e-06,
2206
+ "loss": 0.9481,
2207
+ "step": 157000
2208
+ },
2209
+ {
2210
+ "epoch": 20.043268007126496,
2211
+ "grad_norm": 64.03028106689453,
2212
+ "learning_rate": 3.46881367945424e-06,
2213
+ "loss": 0.8982,
2214
+ "step": 157500
2215
+ },
2216
+ {
2217
+ "epoch": 20.10689742937134,
2218
+ "grad_norm": 68.30170440673828,
2219
+ "learning_rate": 3.446664304066625e-06,
2220
+ "loss": 0.8974,
2221
+ "step": 158000
2222
+ },
2223
+ {
2224
+ "epoch": 20.170526851616188,
2225
+ "grad_norm": 44.01250457763672,
2226
+ "learning_rate": 3.424514928679011e-06,
2227
+ "loss": 0.9022,
2228
+ "step": 158500
2229
+ },
2230
+ {
2231
+ "epoch": 20.234156273861032,
2232
+ "grad_norm": 65.26950073242188,
2233
+ "learning_rate": 3.4023655532913972e-06,
2234
+ "loss": 0.8923,
2235
+ "step": 159000
2236
+ },
2237
+ {
2238
+ "epoch": 20.29778569610588,
2239
+ "grad_norm": 46.552730560302734,
2240
+ "learning_rate": 3.380260476654559e-06,
2241
+ "loss": 0.8935,
2242
+ "step": 159500
2243
+ },
2244
+ {
2245
+ "epoch": 20.361415118350724,
2246
+ "grad_norm": 59.73283767700195,
2247
+ "learning_rate": 3.358111101266944e-06,
2248
+ "loss": 0.8917,
2249
+ "step": 160000
2250
+ },
2251
+ {
2252
+ "epoch": 20.42504454059557,
2253
+ "grad_norm": 84.19660186767578,
2254
+ "learning_rate": 3.33596172587933e-06,
2255
+ "loss": 0.9021,
2256
+ "step": 160500
2257
+ },
2258
+ {
2259
+ "epoch": 20.488673962840416,
2260
+ "grad_norm": 48.705833435058594,
2261
+ "learning_rate": 3.3138123504917162e-06,
2262
+ "loss": 0.8978,
2263
+ "step": 161000
2264
+ },
2265
+ {
2266
+ "epoch": 20.552303385085263,
2267
+ "grad_norm": 44.999656677246094,
2268
+ "learning_rate": 3.2916629751041023e-06,
2269
+ "loss": 0.9078,
2270
+ "step": 161500
2271
+ },
2272
+ {
2273
+ "epoch": 20.61593280733011,
2274
+ "grad_norm": 62.21163558959961,
2275
+ "learning_rate": 3.2695135997164883e-06,
2276
+ "loss": 0.903,
2277
+ "step": 162000
2278
+ },
2279
+ {
2280
+ "epoch": 20.679562229574955,
2281
+ "grad_norm": 61.40314483642578,
2282
+ "learning_rate": 3.247408523079649e-06,
2283
+ "loss": 0.8989,
2284
+ "step": 162500
2285
+ },
2286
+ {
2287
+ "epoch": 20.743191651819803,
2288
+ "grad_norm": 55.93895721435547,
2289
+ "learning_rate": 3.2252591476920352e-06,
2290
+ "loss": 0.9023,
2291
+ "step": 163000
2292
+ },
2293
+ {
2294
+ "epoch": 20.806821074064647,
2295
+ "grad_norm": 64.3861312866211,
2296
+ "learning_rate": 3.2031097723044213e-06,
2297
+ "loss": 0.8918,
2298
+ "step": 163500
2299
+ },
2300
+ {
2301
+ "epoch": 20.870450496309495,
2302
+ "grad_norm": 92.62686157226562,
2303
+ "learning_rate": 3.1809603969168073e-06,
2304
+ "loss": 0.8968,
2305
+ "step": 164000
2306
+ },
2307
+ {
2308
+ "epoch": 20.93407991855434,
2309
+ "grad_norm": 58.37923049926758,
2310
+ "learning_rate": 3.1588110215291934e-06,
2311
+ "loss": 0.8977,
2312
+ "step": 164500
2313
+ },
2314
+ {
2315
+ "epoch": 20.997709340799187,
2316
+ "grad_norm": 49.36125564575195,
2317
+ "learning_rate": 3.136661646141579e-06,
2318
+ "loss": 0.9035,
2319
+ "step": 165000
2320
+ },
2321
+ {
2322
+ "epoch": 21.06133876304403,
2323
+ "grad_norm": 69.00907135009766,
2324
+ "learning_rate": 3.114512270753965e-06,
2325
+ "loss": 0.8347,
2326
+ "step": 165500
2327
+ },
2328
+ {
2329
+ "epoch": 21.12496818528888,
2330
+ "grad_norm": 56.581600189208984,
2331
+ "learning_rate": 3.0924071941171263e-06,
2332
+ "loss": 0.8415,
2333
+ "step": 166000
2334
+ },
2335
+ {
2336
+ "epoch": 21.188597607533723,
2337
+ "grad_norm": 56.39949417114258,
2338
+ "learning_rate": 3.0702578187295124e-06,
2339
+ "loss": 0.8472,
2340
+ "step": 166500
2341
+ },
2342
+ {
2343
+ "epoch": 21.25222702977857,
2344
+ "grad_norm": 58.36819839477539,
2345
+ "learning_rate": 3.048108443341898e-06,
2346
+ "loss": 0.8663,
2347
+ "step": 167000
2348
+ },
2349
+ {
2350
+ "epoch": 21.315856452023414,
2351
+ "grad_norm": 59.97529602050781,
2352
+ "learning_rate": 3.025959067954284e-06,
2353
+ "loss": 0.8633,
2354
+ "step": 167500
2355
+ },
2356
+ {
2357
+ "epoch": 21.379485874268262,
2358
+ "grad_norm": 52.96846008300781,
2359
+ "learning_rate": 3.00380969256667e-06,
2360
+ "loss": 0.8569,
2361
+ "step": 168000
2362
+ },
2363
+ {
2364
+ "epoch": 21.443115296513106,
2365
+ "grad_norm": 52.254085540771484,
2366
+ "learning_rate": 2.981660317179056e-06,
2367
+ "loss": 0.8529,
2368
+ "step": 168500
2369
+ },
2370
+ {
2371
+ "epoch": 21.506744718757954,
2372
+ "grad_norm": 80.7918472290039,
2373
+ "learning_rate": 2.959555240542217e-06,
2374
+ "loss": 0.8485,
2375
+ "step": 169000
2376
+ },
2377
+ {
2378
+ "epoch": 21.570374141002798,
2379
+ "grad_norm": 59.65958023071289,
2380
+ "learning_rate": 2.9374058651546026e-06,
2381
+ "loss": 0.8759,
2382
+ "step": 169500
2383
+ },
2384
+ {
2385
+ "epoch": 21.634003563247646,
2386
+ "grad_norm": 48.33919906616211,
2387
+ "learning_rate": 2.9152564897669886e-06,
2388
+ "loss": 0.8667,
2389
+ "step": 170000
2390
+ },
2391
+ {
2392
+ "epoch": 21.697632985492493,
2393
+ "grad_norm": 68.83987426757812,
2394
+ "learning_rate": 2.8931071143793747e-06,
2395
+ "loss": 0.8615,
2396
+ "step": 170500
2397
+ },
2398
+ {
2399
+ "epoch": 21.761262407737338,
2400
+ "grad_norm": 42.605552673339844,
2401
+ "learning_rate": 2.8709577389917607e-06,
2402
+ "loss": 0.8623,
2403
+ "step": 171000
2404
+ },
2405
+ {
2406
+ "epoch": 21.824891829982185,
2407
+ "grad_norm": 57.37046432495117,
2408
+ "learning_rate": 2.8488083636041464e-06,
2409
+ "loss": 0.8613,
2410
+ "step": 171500
2411
+ },
2412
+ {
2413
+ "epoch": 21.88852125222703,
2414
+ "grad_norm": 66.89559173583984,
2415
+ "learning_rate": 2.8266589882165324e-06,
2416
+ "loss": 0.8515,
2417
+ "step": 172000
2418
+ },
2419
+ {
2420
+ "epoch": 21.952150674471877,
2421
+ "grad_norm": 53.939571380615234,
2422
+ "learning_rate": 2.8045096128289184e-06,
2423
+ "loss": 0.8615,
2424
+ "step": 172500
2425
+ },
2426
+ {
2427
+ "epoch": 22.01578009671672,
2428
+ "grad_norm": 61.67373275756836,
2429
+ "learning_rate": 2.7824045361920797e-06,
2430
+ "loss": 0.8457,
2431
+ "step": 173000
2432
+ },
2433
+ {
2434
+ "epoch": 22.07940951896157,
2435
+ "grad_norm": 71.31520080566406,
2436
+ "learning_rate": 2.7602551608044653e-06,
2437
+ "loss": 0.8106,
2438
+ "step": 173500
2439
+ },
2440
+ {
2441
+ "epoch": 22.143038941206413,
2442
+ "grad_norm": 44.70698165893555,
2443
+ "learning_rate": 2.7381057854168514e-06,
2444
+ "loss": 0.8109,
2445
+ "step": 174000
2446
+ },
2447
+ {
2448
+ "epoch": 22.20666836345126,
2449
+ "grad_norm": 43.95622253417969,
2450
+ "learning_rate": 2.7159564100292374e-06,
2451
+ "loss": 0.8108,
2452
+ "step": 174500
2453
+ },
2454
+ {
2455
+ "epoch": 22.270297785696105,
2456
+ "grad_norm": 55.156822204589844,
2457
+ "learning_rate": 2.6938513333923987e-06,
2458
+ "loss": 0.8197,
2459
+ "step": 175000
2460
+ },
2461
+ {
2462
+ "epoch": 22.333927207940953,
2463
+ "grad_norm": 85.59542846679688,
2464
+ "learning_rate": 2.6717019580047843e-06,
2465
+ "loss": 0.8165,
2466
+ "step": 175500
2467
+ },
2468
+ {
2469
+ "epoch": 22.397556630185797,
2470
+ "grad_norm": 65.07913208007812,
2471
+ "learning_rate": 2.6495525826171704e-06,
2472
+ "loss": 0.8289,
2473
+ "step": 176000
2474
+ },
2475
+ {
2476
+ "epoch": 22.461186052430644,
2477
+ "grad_norm": 65.89120483398438,
2478
+ "learning_rate": 2.6274032072295564e-06,
2479
+ "loss": 0.8288,
2480
+ "step": 176500
2481
+ },
2482
+ {
2483
+ "epoch": 22.52481547467549,
2484
+ "grad_norm": 55.914939880371094,
2485
+ "learning_rate": 2.6052981305927177e-06,
2486
+ "loss": 0.8145,
2487
+ "step": 177000
2488
+ },
2489
+ {
2490
+ "epoch": 22.588444896920336,
2491
+ "grad_norm": 80.44625854492188,
2492
+ "learning_rate": 2.5831487552051033e-06,
2493
+ "loss": 0.8249,
2494
+ "step": 177500
2495
+ },
2496
+ {
2497
+ "epoch": 22.65207431916518,
2498
+ "grad_norm": 65.78691101074219,
2499
+ "learning_rate": 2.5609993798174894e-06,
2500
+ "loss": 0.8218,
2501
+ "step": 178000
2502
+ },
2503
+ {
2504
+ "epoch": 22.715703741410028,
2505
+ "grad_norm": 41.24105453491211,
2506
+ "learning_rate": 2.5388500044298754e-06,
2507
+ "loss": 0.8284,
2508
+ "step": 178500
2509
+ },
2510
+ {
2511
+ "epoch": 22.779333163654876,
2512
+ "grad_norm": 64.46809387207031,
2513
+ "learning_rate": 2.5167892265438115e-06,
2514
+ "loss": 0.833,
2515
+ "step": 179000
2516
+ },
2517
+ {
2518
+ "epoch": 22.84296258589972,
2519
+ "grad_norm": 56.47655487060547,
2520
+ "learning_rate": 2.4946398511561976e-06,
2521
+ "loss": 0.8176,
2522
+ "step": 179500
2523
+ },
2524
+ {
2525
+ "epoch": 22.906592008144568,
2526
+ "grad_norm": 71.83133697509766,
2527
+ "learning_rate": 2.4724904757685836e-06,
2528
+ "loss": 0.8431,
2529
+ "step": 180000
2530
+ },
2531
+ {
2532
+ "epoch": 22.97022143038941,
2533
+ "grad_norm": 53.66551971435547,
2534
+ "learning_rate": 2.450385399131745e-06,
2535
+ "loss": 0.8234,
2536
+ "step": 180500
2537
+ },
2538
+ {
2539
+ "epoch": 23.03385085263426,
2540
+ "grad_norm": 78.90325927734375,
2541
+ "learning_rate": 2.4282360237441305e-06,
2542
+ "loss": 0.7998,
2543
+ "step": 181000
2544
+ },
2545
+ {
2546
+ "epoch": 23.097480274879103,
2547
+ "grad_norm": 64.31370544433594,
2548
+ "learning_rate": 2.4060866483565166e-06,
2549
+ "loss": 0.7821,
2550
+ "step": 181500
2551
+ },
2552
+ {
2553
+ "epoch": 23.16110969712395,
2554
+ "grad_norm": 48.68030548095703,
2555
+ "learning_rate": 2.3839372729689026e-06,
2556
+ "loss": 0.7914,
2557
+ "step": 182000
2558
+ },
2559
+ {
2560
+ "epoch": 23.224739119368795,
2561
+ "grad_norm": 52.06983184814453,
2562
+ "learning_rate": 2.3617878975812882e-06,
2563
+ "loss": 0.7851,
2564
+ "step": 182500
2565
+ },
2566
+ {
2567
+ "epoch": 23.288368541613643,
2568
+ "grad_norm": 50.310157775878906,
2569
+ "learning_rate": 2.3396385221936743e-06,
2570
+ "loss": 0.7797,
2571
+ "step": 183000
2572
+ },
2573
+ {
2574
+ "epoch": 23.351997963858487,
2575
+ "grad_norm": 52.41871643066406,
2576
+ "learning_rate": 2.3174891468060603e-06,
2577
+ "loss": 0.7931,
2578
+ "step": 183500
2579
+ },
2580
+ {
2581
+ "epoch": 23.415627386103335,
2582
+ "grad_norm": 88.78260040283203,
2583
+ "learning_rate": 2.295339771418446e-06,
2584
+ "loss": 0.7912,
2585
+ "step": 184000
2586
+ },
2587
+ {
2588
+ "epoch": 23.47925680834818,
2589
+ "grad_norm": 62.528663635253906,
2590
+ "learning_rate": 2.273190396030832e-06,
2591
+ "loss": 0.7876,
2592
+ "step": 184500
2593
+ },
2594
+ {
2595
+ "epoch": 23.542886230593027,
2596
+ "grad_norm": 46.27097702026367,
2597
+ "learning_rate": 2.251041020643218e-06,
2598
+ "loss": 0.7954,
2599
+ "step": 185000
2600
+ },
2601
+ {
2602
+ "epoch": 23.60651565283787,
2603
+ "grad_norm": 50.20694351196289,
2604
+ "learning_rate": 2.228891645255604e-06,
2605
+ "loss": 0.7946,
2606
+ "step": 185500
2607
+ },
2608
+ {
2609
+ "epoch": 23.67014507508272,
2610
+ "grad_norm": 56.892765045166016,
2611
+ "learning_rate": 2.20674226986799e-06,
2612
+ "loss": 0.7782,
2613
+ "step": 186000
2614
+ },
2615
+ {
2616
+ "epoch": 23.733774497327563,
2617
+ "grad_norm": 41.52644729614258,
2618
+ "learning_rate": 2.184637193231151e-06,
2619
+ "loss": 0.7952,
2620
+ "step": 186500
2621
+ },
2622
+ {
2623
+ "epoch": 23.79740391957241,
2624
+ "grad_norm": 58.025516510009766,
2625
+ "learning_rate": 2.162487817843537e-06,
2626
+ "loss": 0.8015,
2627
+ "step": 187000
2628
+ },
2629
+ {
2630
+ "epoch": 23.861033341817258,
2631
+ "grad_norm": 48.62569046020508,
2632
+ "learning_rate": 2.140338442455923e-06,
2633
+ "loss": 0.7977,
2634
+ "step": 187500
2635
+ },
2636
+ {
2637
+ "epoch": 23.924662764062102,
2638
+ "grad_norm": 46.91473388671875,
2639
+ "learning_rate": 2.1181890670683087e-06,
2640
+ "loss": 0.7875,
2641
+ "step": 188000
2642
+ },
2643
+ {
2644
+ "epoch": 23.98829218630695,
2645
+ "grad_norm": 52.42847442626953,
2646
+ "learning_rate": 2.09608399043147e-06,
2647
+ "loss": 0.7935,
2648
+ "step": 188500
2649
+ },
2650
+ {
2651
+ "epoch": 24.051921608551794,
2652
+ "grad_norm": 76.6783676147461,
2653
+ "learning_rate": 2.073934615043856e-06,
2654
+ "loss": 0.7617,
2655
+ "step": 189000
2656
+ },
2657
+ {
2658
+ "epoch": 24.11555103079664,
2659
+ "grad_norm": 67.17424011230469,
2660
+ "learning_rate": 2.0517852396562417e-06,
2661
+ "loss": 0.7625,
2662
+ "step": 189500
2663
+ },
2664
+ {
2665
+ "epoch": 24.179180453041486,
2666
+ "grad_norm": 50.021053314208984,
2667
+ "learning_rate": 2.0296358642686277e-06,
2668
+ "loss": 0.7514,
2669
+ "step": 190000
2670
+ },
2671
+ {
2672
+ "epoch": 24.242809875286333,
2673
+ "grad_norm": 53.048465728759766,
2674
+ "learning_rate": 2.0074864888810137e-06,
2675
+ "loss": 0.7662,
2676
+ "step": 190500
2677
+ },
2678
+ {
2679
+ "epoch": 24.306439297531178,
2680
+ "grad_norm": 67.73706817626953,
2681
+ "learning_rate": 1.9854257109949503e-06,
2682
+ "loss": 0.7692,
2683
+ "step": 191000
2684
+ },
2685
+ {
2686
+ "epoch": 24.370068719776025,
2687
+ "grad_norm": 57.47793960571289,
2688
+ "learning_rate": 1.9632763356073363e-06,
2689
+ "loss": 0.7733,
2690
+ "step": 191500
2691
+ },
2692
+ {
2693
+ "epoch": 24.43369814202087,
2694
+ "grad_norm": 59.039405822753906,
2695
+ "learning_rate": 1.941126960219722e-06,
2696
+ "loss": 0.7561,
2697
+ "step": 192000
2698
+ },
2699
+ {
2700
+ "epoch": 24.497327564265717,
2701
+ "grad_norm": 46.10505676269531,
2702
+ "learning_rate": 1.918977584832108e-06,
2703
+ "loss": 0.7577,
2704
+ "step": 192500
2705
+ },
2706
+ {
2707
+ "epoch": 24.56095698651056,
2708
+ "grad_norm": 85.20184326171875,
2709
+ "learning_rate": 1.8968282094444936e-06,
2710
+ "loss": 0.7687,
2711
+ "step": 193000
2712
+ },
2713
+ {
2714
+ "epoch": 24.62458640875541,
2715
+ "grad_norm": 53.42023849487305,
2716
+ "learning_rate": 1.8746788340568796e-06,
2717
+ "loss": 0.7647,
2718
+ "step": 193500
2719
+ },
2720
+ {
2721
+ "epoch": 24.688215831000253,
2722
+ "grad_norm": 63.070919036865234,
2723
+ "learning_rate": 1.852573757420041e-06,
2724
+ "loss": 0.7717,
2725
+ "step": 194000
2726
+ },
2727
+ {
2728
+ "epoch": 24.7518452532451,
2729
+ "grad_norm": 59.02709197998047,
2730
+ "learning_rate": 1.830424382032427e-06,
2731
+ "loss": 0.761,
2732
+ "step": 194500
2733
+ },
2734
+ {
2735
+ "epoch": 24.815474675489945,
2736
+ "grad_norm": 47.43505859375,
2737
+ "learning_rate": 1.8082750066448126e-06,
2738
+ "loss": 0.7661,
2739
+ "step": 195000
2740
+ },
2741
+ {
2742
+ "epoch": 24.879104097734793,
2743
+ "grad_norm": 79.37848663330078,
2744
+ "learning_rate": 1.7861256312571986e-06,
2745
+ "loss": 0.7446,
2746
+ "step": 195500
2747
+ },
2748
+ {
2749
+ "epoch": 24.94273351997964,
2750
+ "grad_norm": 51.29045104980469,
2751
+ "learning_rate": 1.7639762558695847e-06,
2752
+ "loss": 0.7659,
2753
+ "step": 196000
2754
+ },
2755
+ {
2756
+ "epoch": 25.006362942224484,
2757
+ "grad_norm": 43.27066421508789,
2758
+ "learning_rate": 1.7418711792327458e-06,
2759
+ "loss": 0.7559,
2760
+ "step": 196500
2761
+ },
2762
+ {
2763
+ "epoch": 25.069992364469332,
2764
+ "grad_norm": 45.82556915283203,
2765
+ "learning_rate": 1.7197218038451316e-06,
2766
+ "loss": 0.7183,
2767
+ "step": 197000
2768
+ },
2769
+ {
2770
+ "epoch": 25.133621786714176,
2771
+ "grad_norm": 61.48518753051758,
2772
+ "learning_rate": 1.6975724284575176e-06,
2773
+ "loss": 0.7399,
2774
+ "step": 197500
2775
+ },
2776
+ {
2777
+ "epoch": 25.197251208959024,
2778
+ "grad_norm": 56.30770492553711,
2779
+ "learning_rate": 1.6754230530699037e-06,
2780
+ "loss": 0.7308,
2781
+ "step": 198000
2782
+ },
2783
+ {
2784
+ "epoch": 25.260880631203868,
2785
+ "grad_norm": 77.96941375732422,
2786
+ "learning_rate": 1.6532736776822895e-06,
2787
+ "loss": 0.733,
2788
+ "step": 198500
2789
+ },
2790
+ {
2791
+ "epoch": 25.324510053448716,
2792
+ "grad_norm": 75.03907775878906,
2793
+ "learning_rate": 1.6311243022946753e-06,
2794
+ "loss": 0.746,
2795
+ "step": 199000
2796
+ },
2797
+ {
2798
+ "epoch": 25.38813947569356,
2799
+ "grad_norm": 49.624366760253906,
2800
+ "learning_rate": 1.6089749269070614e-06,
2801
+ "loss": 0.7274,
2802
+ "step": 199500
2803
+ },
2804
+ {
2805
+ "epoch": 25.451768897938408,
2806
+ "grad_norm": 54.31991195678711,
2807
+ "learning_rate": 1.5868255515194472e-06,
2808
+ "loss": 0.7358,
2809
+ "step": 200000
2810
+ },
2811
+ {
2812
+ "epoch": 25.51539832018325,
2813
+ "grad_norm": 57.31879425048828,
2814
+ "learning_rate": 1.5646761761318333e-06,
2815
+ "loss": 0.7468,
2816
+ "step": 200500
2817
+ },
2818
+ {
2819
+ "epoch": 25.5790277424281,
2820
+ "grad_norm": 51.115596771240234,
2821
+ "learning_rate": 1.5425710994949943e-06,
2822
+ "loss": 0.734,
2823
+ "step": 201000
2824
+ },
2825
+ {
2826
+ "epoch": 25.642657164672944,
2827
+ "grad_norm": 68.8400650024414,
2828
+ "learning_rate": 1.5204660228581556e-06,
2829
+ "loss": 0.7493,
2830
+ "step": 201500
2831
+ },
2832
+ {
2833
+ "epoch": 25.70628658691779,
2834
+ "grad_norm": 40.318153381347656,
2835
+ "learning_rate": 1.4983166474705415e-06,
2836
+ "loss": 0.7263,
2837
+ "step": 202000
2838
+ },
2839
+ {
2840
+ "epoch": 25.769916009162635,
2841
+ "grad_norm": 63.84051513671875,
2842
+ "learning_rate": 1.4761672720829273e-06,
2843
+ "loss": 0.7355,
2844
+ "step": 202500
2845
+ },
2846
+ {
2847
+ "epoch": 25.833545431407483,
2848
+ "grad_norm": 61.53810501098633,
2849
+ "learning_rate": 1.4540178966953133e-06,
2850
+ "loss": 0.745,
2851
+ "step": 203000
2852
+ },
2853
+ {
2854
+ "epoch": 25.897174853652327,
2855
+ "grad_norm": 51.513187408447266,
2856
+ "learning_rate": 1.4318685213076994e-06,
2857
+ "loss": 0.7301,
2858
+ "step": 203500
2859
+ },
2860
+ {
2861
+ "epoch": 25.960804275897175,
2862
+ "grad_norm": 65.9397201538086,
2863
+ "learning_rate": 1.4097191459200852e-06,
2864
+ "loss": 0.7457,
2865
+ "step": 204000
2866
+ },
2867
+ {
2868
+ "epoch": 26.024433698142023,
2869
+ "grad_norm": 59.864540100097656,
2870
+ "learning_rate": 1.3875697705324713e-06,
2871
+ "loss": 0.7072,
2872
+ "step": 204500
2873
+ },
2874
+ {
2875
+ "epoch": 26.088063120386867,
2876
+ "grad_norm": 52.43559646606445,
2877
+ "learning_rate": 1.3654203951448569e-06,
2878
+ "loss": 0.7212,
2879
+ "step": 205000
2880
+ },
2881
+ {
2882
+ "epoch": 26.151692542631714,
2883
+ "grad_norm": 45.977447509765625,
2884
+ "learning_rate": 1.343315318508018e-06,
2885
+ "loss": 0.7186,
2886
+ "step": 205500
2887
+ },
2888
+ {
2889
+ "epoch": 26.21532196487656,
2890
+ "grad_norm": 48.874996185302734,
2891
+ "learning_rate": 1.321165943120404e-06,
2892
+ "loss": 0.7225,
2893
+ "step": 206000
2894
+ },
2895
+ {
2896
+ "epoch": 26.278951387121406,
2897
+ "grad_norm": 57.50956344604492,
2898
+ "learning_rate": 1.29901656773279e-06,
2899
+ "loss": 0.7065,
2900
+ "step": 206500
2901
+ },
2902
+ {
2903
+ "epoch": 26.34258080936625,
2904
+ "grad_norm": 64.73326110839844,
2905
+ "learning_rate": 1.2768671923451759e-06,
2906
+ "loss": 0.7153,
2907
+ "step": 207000
2908
+ },
2909
+ {
2910
+ "epoch": 26.406210231611098,
2911
+ "grad_norm": 53.96969223022461,
2912
+ "learning_rate": 1.254762115708337e-06,
2913
+ "loss": 0.72,
2914
+ "step": 207500
2915
+ },
2916
+ {
2917
+ "epoch": 26.469839653855942,
2918
+ "grad_norm": 56.118255615234375,
2919
+ "learning_rate": 1.232612740320723e-06,
2920
+ "loss": 0.7074,
2921
+ "step": 208000
2922
+ },
2923
+ {
2924
+ "epoch": 26.53346907610079,
2925
+ "grad_norm": 57.4562873840332,
2926
+ "learning_rate": 1.2104633649331088e-06,
2927
+ "loss": 0.7117,
2928
+ "step": 208500
2929
+ },
2930
+ {
2931
+ "epoch": 26.597098498345634,
2932
+ "grad_norm": 60.367919921875,
2933
+ "learning_rate": 1.1883139895454949e-06,
2934
+ "loss": 0.7206,
2935
+ "step": 209000
2936
+ },
2937
+ {
2938
+ "epoch": 26.66072792059048,
2939
+ "grad_norm": 55.18882369995117,
2940
+ "learning_rate": 1.166164614157881e-06,
2941
+ "loss": 0.7132,
2942
+ "step": 209500
2943
+ },
2944
+ {
2945
+ "epoch": 26.724357342835326,
2946
+ "grad_norm": 48.3643798828125,
2947
+ "learning_rate": 1.144059537521042e-06,
2948
+ "loss": 0.7199,
2949
+ "step": 210000
2950
+ },
2951
+ {
2952
+ "epoch": 26.787986765080174,
2953
+ "grad_norm": 50.825225830078125,
2954
+ "learning_rate": 1.1219101621334278e-06,
2955
+ "loss": 0.7102,
2956
+ "step": 210500
2957
+ },
2958
+ {
2959
+ "epoch": 26.851616187325018,
2960
+ "grad_norm": 36.502899169921875,
2961
+ "learning_rate": 1.0997607867458139e-06,
2962
+ "loss": 0.7155,
2963
+ "step": 211000
2964
+ },
2965
+ {
2966
+ "epoch": 26.915245609569865,
2967
+ "grad_norm": 58.10041809082031,
2968
+ "learning_rate": 1.0776114113581997e-06,
2969
+ "loss": 0.7057,
2970
+ "step": 211500
2971
+ },
2972
+ {
2973
+ "epoch": 26.97887503181471,
2974
+ "grad_norm": 41.11175537109375,
2975
+ "learning_rate": 1.055506334721361e-06,
2976
+ "loss": 0.7191,
2977
+ "step": 212000
2978
+ },
2979
+ {
2980
+ "epoch": 27.042504454059557,
2981
+ "grad_norm": 56.8629150390625,
2982
+ "learning_rate": 1.0333569593337468e-06,
2983
+ "loss": 0.6942,
2984
+ "step": 212500
2985
+ },
2986
+ {
2987
+ "epoch": 27.106133876304405,
2988
+ "grad_norm": 43.03855514526367,
2989
+ "learning_rate": 1.011251882696908e-06,
2990
+ "loss": 0.6924,
2991
+ "step": 213000
2992
+ },
2993
+ {
2994
+ "epoch": 27.16976329854925,
2995
+ "grad_norm": 41.03914260864258,
2996
+ "learning_rate": 9.89102507309294e-07,
2997
+ "loss": 0.7025,
2998
+ "step": 213500
2999
+ },
3000
+ {
3001
+ "epoch": 27.233392720794097,
3002
+ "grad_norm": 44.40423583984375,
3003
+ "learning_rate": 9.6695313192168e-07,
3004
+ "loss": 0.6911,
3005
+ "step": 214000
3006
+ },
3007
+ {
3008
+ "epoch": 27.29702214303894,
3009
+ "grad_norm": 48.28982925415039,
3010
+ "learning_rate": 9.448037565340658e-07,
3011
+ "loss": 0.6955,
3012
+ "step": 214500
3013
+ },
3014
+ {
3015
+ "epoch": 27.36065156528379,
3016
+ "grad_norm": 58.85805130004883,
3017
+ "learning_rate": 9.226543811464518e-07,
3018
+ "loss": 0.6875,
3019
+ "step": 215000
3020
+ },
3021
+ {
3022
+ "epoch": 27.424280987528633,
3023
+ "grad_norm": 58.64131164550781,
3024
+ "learning_rate": 9.005050057588377e-07,
3025
+ "loss": 0.698,
3026
+ "step": 215500
3027
+ },
3028
+ {
3029
+ "epoch": 27.48791040977348,
3030
+ "grad_norm": 54.10153579711914,
3031
+ "learning_rate": 8.783999291219989e-07,
3032
+ "loss": 0.7054,
3033
+ "step": 216000
3034
+ },
3035
+ {
3036
+ "epoch": 27.551539832018324,
3037
+ "grad_norm": 37.60294723510742,
3038
+ "learning_rate": 8.562505537343847e-07,
3039
+ "loss": 0.6968,
3040
+ "step": 216500
3041
+ },
3042
+ {
3043
+ "epoch": 27.615169254263172,
3044
+ "grad_norm": 46.13175964355469,
3045
+ "learning_rate": 8.341011783467707e-07,
3046
+ "loss": 0.7044,
3047
+ "step": 217000
3048
+ },
3049
+ {
3050
+ "epoch": 27.678798676508016,
3051
+ "grad_norm": 49.83407211303711,
3052
+ "learning_rate": 8.119518029591567e-07,
3053
+ "loss": 0.6946,
3054
+ "step": 217500
3055
+ },
3056
+ {
3057
+ "epoch": 27.742428098752864,
3058
+ "grad_norm": 62.65508270263672,
3059
+ "learning_rate": 7.898024275715425e-07,
3060
+ "loss": 0.6865,
3061
+ "step": 218000
3062
+ },
3063
+ {
3064
+ "epoch": 27.806057520997708,
3065
+ "grad_norm": 64.78981018066406,
3066
+ "learning_rate": 7.676530521839285e-07,
3067
+ "loss": 0.6974,
3068
+ "step": 218500
3069
+ },
3070
+ {
3071
+ "epoch": 27.869686943242556,
3072
+ "grad_norm": 55.65605926513672,
3073
+ "learning_rate": 7.455479755470895e-07,
3074
+ "loss": 0.698,
3075
+ "step": 219000
3076
+ },
3077
+ {
3078
+ "epoch": 27.9333163654874,
3079
+ "grad_norm": 51.40291976928711,
3080
+ "learning_rate": 7.233986001594756e-07,
3081
+ "loss": 0.6943,
3082
+ "step": 219500
3083
+ },
3084
+ {
3085
+ "epoch": 27.996945787732248,
3086
+ "grad_norm": 52.821475982666016,
3087
+ "learning_rate": 7.012492247718615e-07,
3088
+ "loss": 0.6985,
3089
+ "step": 220000
3090
+ },
3091
+ {
3092
+ "epoch": 28.06057520997709,
3093
+ "grad_norm": 105.65634155273438,
3094
+ "learning_rate": 6.790998493842474e-07,
3095
+ "loss": 0.6785,
3096
+ "step": 220500
3097
+ },
3098
+ {
3099
+ "epoch": 28.12420463222194,
3100
+ "grad_norm": 66.97595977783203,
3101
+ "learning_rate": 6.569504739966333e-07,
3102
+ "loss": 0.6842,
3103
+ "step": 221000
3104
+ },
3105
+ {
3106
+ "epoch": 28.187834054466787,
3107
+ "grad_norm": 77.8376693725586,
3108
+ "learning_rate": 6.348010986090193e-07,
3109
+ "loss": 0.6832,
3110
+ "step": 221500
3111
+ },
3112
+ {
3113
+ "epoch": 28.25146347671163,
3114
+ "grad_norm": 68.83918762207031,
3115
+ "learning_rate": 6.126517232214052e-07,
3116
+ "loss": 0.6863,
3117
+ "step": 222000
3118
+ },
3119
+ {
3120
+ "epoch": 28.31509289895648,
3121
+ "grad_norm": 49.16581344604492,
3122
+ "learning_rate": 5.905023478337911e-07,
3123
+ "loss": 0.6806,
3124
+ "step": 222500
3125
+ },
3126
+ {
3127
+ "epoch": 28.378722321201323,
3128
+ "grad_norm": 58.93035888671875,
3129
+ "learning_rate": 5.683972711969523e-07,
3130
+ "loss": 0.6897,
3131
+ "step": 223000
3132
+ },
3133
+ {
3134
+ "epoch": 28.44235174344617,
3135
+ "grad_norm": 57.476531982421875,
3136
+ "learning_rate": 5.462478958093382e-07,
3137
+ "loss": 0.6975,
3138
+ "step": 223500
3139
+ },
3140
+ {
3141
+ "epoch": 28.505981165691015,
3142
+ "grad_norm": 56.7477912902832,
3143
+ "learning_rate": 5.240985204217242e-07,
3144
+ "loss": 0.6802,
3145
+ "step": 224000
3146
+ },
3147
+ {
3148
+ "epoch": 28.569610587935863,
3149
+ "grad_norm": 45.01617431640625,
3150
+ "learning_rate": 5.0194914503411e-07,
3151
+ "loss": 0.6836,
3152
+ "step": 224500
3153
+ },
3154
+ {
3155
+ "epoch": 28.633240010180707,
3156
+ "grad_norm": 49.77652359008789,
3157
+ "learning_rate": 4.79799769646496e-07,
3158
+ "loss": 0.6849,
3159
+ "step": 225000
3160
+ },
3161
+ {
3162
+ "epoch": 28.696869432425554,
3163
+ "grad_norm": 52.57892990112305,
3164
+ "learning_rate": 4.57650394258882e-07,
3165
+ "loss": 0.6781,
3166
+ "step": 225500
3167
+ },
3168
+ {
3169
+ "epoch": 28.7604988546704,
3170
+ "grad_norm": 64.97437286376953,
3171
+ "learning_rate": 4.3550101887126787e-07,
3172
+ "loss": 0.6761,
3173
+ "step": 226000
3174
+ },
3175
+ {
3176
+ "epoch": 28.824128276915246,
3177
+ "grad_norm": 52.035160064697266,
3178
+ "learning_rate": 4.133516434836538e-07,
3179
+ "loss": 0.6762,
3180
+ "step": 226500
3181
+ },
3182
+ {
3183
+ "epoch": 28.88775769916009,
3184
+ "grad_norm": 57.393035888671875,
3185
+ "learning_rate": 3.912022680960397e-07,
3186
+ "loss": 0.6781,
3187
+ "step": 227000
3188
+ },
3189
+ {
3190
+ "epoch": 28.951387121404938,
3191
+ "grad_norm": 49.78774642944336,
3192
+ "learning_rate": 3.691414902099761e-07,
3193
+ "loss": 0.682,
3194
+ "step": 227500
3195
+ },
3196
+ {
3197
+ "epoch": 29.015016543649782,
3198
+ "grad_norm": 47.4661750793457,
3199
+ "learning_rate": 3.4699211482236206e-07,
3200
+ "loss": 0.6742,
3201
+ "step": 228000
3202
+ },
3203
+ {
3204
+ "epoch": 29.07864596589463,
3205
+ "grad_norm": 69.56925964355469,
3206
+ "learning_rate": 3.248870381855232e-07,
3207
+ "loss": 0.6595,
3208
+ "step": 228500
3209
+ },
3210
+ {
3211
+ "epoch": 29.142275388139474,
3212
+ "grad_norm": 49.844520568847656,
3213
+ "learning_rate": 3.027376627979091e-07,
3214
+ "loss": 0.683,
3215
+ "step": 229000
3216
+ },
3217
+ {
3218
+ "epoch": 29.20590481038432,
3219
+ "grad_norm": 58.6362419128418,
3220
+ "learning_rate": 2.8058828741029506e-07,
3221
+ "loss": 0.6721,
3222
+ "step": 229500
3223
+ },
3224
+ {
3225
+ "epoch": 29.26953423262917,
3226
+ "grad_norm": 44.214717864990234,
3227
+ "learning_rate": 2.5843891202268095e-07,
3228
+ "loss": 0.669,
3229
+ "step": 230000
3230
+ },
3231
+ {
3232
+ "epoch": 29.333163654874014,
3233
+ "grad_norm": 50.08256530761719,
3234
+ "learning_rate": 2.3628953663506691e-07,
3235
+ "loss": 0.683,
3236
+ "step": 230500
3237
+ },
3238
+ {
3239
+ "epoch": 29.39679307711886,
3240
+ "grad_norm": 51.15972900390625,
3241
+ "learning_rate": 2.1414016124745283e-07,
3242
+ "loss": 0.6652,
3243
+ "step": 231000
3244
+ },
3245
+ {
3246
+ "epoch": 29.460422499363705,
3247
+ "grad_norm": 47.7255859375,
3248
+ "learning_rate": 1.92035084610614e-07,
3249
+ "loss": 0.671,
3250
+ "step": 231500
3251
+ },
3252
+ {
3253
+ "epoch": 29.524051921608553,
3254
+ "grad_norm": 45.42967987060547,
3255
+ "learning_rate": 1.6988570922299992e-07,
3256
+ "loss": 0.6662,
3257
+ "step": 232000
3258
+ },
3259
+ {
3260
+ "epoch": 29.587681343853397,
3261
+ "grad_norm": 47.5881462097168,
3262
+ "learning_rate": 1.4773633383538586e-07,
3263
+ "loss": 0.6665,
3264
+ "step": 232500
3265
+ },
3266
+ {
3267
+ "epoch": 29.651310766098245,
3268
+ "grad_norm": 71.63655090332031,
3269
+ "learning_rate": 1.2558695844777177e-07,
3270
+ "loss": 0.6718,
3271
+ "step": 233000
3272
+ },
3273
+ {
3274
+ "epoch": 29.71494018834309,
3275
+ "grad_norm": 45.697998046875,
3276
+ "learning_rate": 1.0343758306015771e-07,
3277
+ "loss": 0.6657,
3278
+ "step": 233500
3279
+ },
3280
+ {
3281
+ "epoch": 29.778569610587937,
3282
+ "grad_norm": 58.17982864379883,
3283
+ "learning_rate": 8.128820767254363e-08,
3284
+ "loss": 0.6677,
3285
+ "step": 234000
3286
+ },
3287
+ {
3288
+ "epoch": 29.84219903283278,
3289
+ "grad_norm": 46.64686965942383,
3290
+ "learning_rate": 5.9138832284929565e-08,
3291
+ "loss": 0.6732,
3292
+ "step": 234500
3293
+ },
3294
+ {
3295
+ "epoch": 29.90582845507763,
3296
+ "grad_norm": 52.96521759033203,
3297
+ "learning_rate": 3.69894568973155e-08,
3298
+ "loss": 0.6687,
3299
+ "step": 235000
3300
+ },
3301
+ {
3302
+ "epoch": 29.969457877322473,
3303
+ "grad_norm": 44.063079833984375,
3304
+ "learning_rate": 1.4840081509701428e-08,
3305
+ "loss": 0.6732,
3306
+ "step": 235500
3307
+ }
3308
+ ],
3309
+ "logging_steps": 500,
3310
+ "max_steps": 235740,
3311
+ "num_input_tokens_seen": 0,
3312
+ "num_train_epochs": 30,
3313
+ "save_steps": 10000,
3314
+ "stateful_callbacks": {
3315
+ "TrainerControl": {
3316
+ "args": {
3317
+ "should_epoch_stop": false,
3318
+ "should_evaluate": false,
3319
+ "should_log": false,
3320
+ "should_save": true,
3321
+ "should_training_stop": true
3322
+ },
3323
+ "attributes": {}
3324
+ }
3325
+ },
3326
+ "total_flos": 0.0,
3327
+ "train_batch_size": 64,
3328
+ "trial_name": null,
3329
+ "trial_params": null
3330
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a26667283d7620e32c6f2b175a2b02d8782de715e181dbc3b1ac4afc313d48b7
3
+ size 5752