AmberYifan commited on
Commit
3068070
·
verified ·
1 Parent(s): 42ce889

Model save

Browse files
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-v0.3
3
+ library_name: transformers
4
+ model_name: Mistral-7B-v0.3-sft-ultrachat
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - sft
9
+ licence: license
10
+ ---
11
+
12
+ # Model Card for Mistral-7B-v0.3-sft-ultrachat
13
+
14
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
+
17
+ ## Quick start
18
+
19
+ ```python
20
+ from transformers import pipeline
21
+
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="AmberYifan/Mistral-7B-v0.3-sft-ultrachat", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
+
28
+ ## Training procedure
29
+
30
+
31
+
32
+ This model was trained with SFT.
33
+
34
+ ### Framework versions
35
+
36
+ - TRL: 0.12.2
37
+ - Transformers: 4.46.3
38
+ - Pytorch: 2.5.1+cu118
39
+ - Datasets: 3.2.0
40
+ - Tokenizers: 0.20.3
41
+
42
+ ## Citations
43
+
44
+
45
+
46
+ Cite TRL as:
47
+
48
+ ```bibtex
49
+ @misc{vonwerra2022trl,
50
+ title = {{TRL: Transformer Reinforcement Learning}},
51
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
52
+ year = 2020,
53
+ journal = {GitHub repository},
54
+ publisher = {GitHub},
55
+ howpublished = {\url{https://github.com/huggingface/trl}}
56
+ }
57
+ ```
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 113379083550720.0,
4
+ "train_loss": 1.171416565762112,
5
+ "train_runtime": 11018.7418,
6
+ "train_samples": 51966,
7
+ "train_samples_per_second": 6.29,
8
+ "train_steps_per_second": 0.197
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.46.3"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 113379083550720.0,
4
+ "train_loss": 1.171416565762112,
5
+ "train_runtime": 11018.7418,
6
+ "train_samples": 51966,
7
+ "train_samples_per_second": 6.29,
8
+ "train_steps_per_second": 0.197
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,3088 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 500,
6
+ "global_step": 2166,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0004616805170821791,
13
+ "grad_norm": 15.672144611335952,
14
+ "learning_rate": 9.216589861751152e-08,
15
+ "loss": 1.3168,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.0023084025854108957,
20
+ "grad_norm": 14.680930201512727,
21
+ "learning_rate": 4.608294930875577e-07,
22
+ "loss": 1.2513,
23
+ "step": 5
24
+ },
25
+ {
26
+ "epoch": 0.0046168051708217915,
27
+ "grad_norm": 8.024031806354767,
28
+ "learning_rate": 9.216589861751154e-07,
29
+ "loss": 1.1965,
30
+ "step": 10
31
+ },
32
+ {
33
+ "epoch": 0.006925207756232687,
34
+ "grad_norm": 5.48716759630272,
35
+ "learning_rate": 1.382488479262673e-06,
36
+ "loss": 1.0985,
37
+ "step": 15
38
+ },
39
+ {
40
+ "epoch": 0.009233610341643583,
41
+ "grad_norm": 5.068857847099135,
42
+ "learning_rate": 1.8433179723502307e-06,
43
+ "loss": 1.0416,
44
+ "step": 20
45
+ },
46
+ {
47
+ "epoch": 0.011542012927054479,
48
+ "grad_norm": 5.784416390233076,
49
+ "learning_rate": 2.3041474654377884e-06,
50
+ "loss": 1.0414,
51
+ "step": 25
52
+ },
53
+ {
54
+ "epoch": 0.013850415512465374,
55
+ "grad_norm": 5.088059220204017,
56
+ "learning_rate": 2.764976958525346e-06,
57
+ "loss": 1.0977,
58
+ "step": 30
59
+ },
60
+ {
61
+ "epoch": 0.016158818097876268,
62
+ "grad_norm": 4.831013532516624,
63
+ "learning_rate": 3.225806451612903e-06,
64
+ "loss": 1.0864,
65
+ "step": 35
66
+ },
67
+ {
68
+ "epoch": 0.018467220683287166,
69
+ "grad_norm": 5.01522341202905,
70
+ "learning_rate": 3.6866359447004615e-06,
71
+ "loss": 1.0875,
72
+ "step": 40
73
+ },
74
+ {
75
+ "epoch": 0.02077562326869806,
76
+ "grad_norm": 5.2743430965797495,
77
+ "learning_rate": 4.147465437788019e-06,
78
+ "loss": 1.0582,
79
+ "step": 45
80
+ },
81
+ {
82
+ "epoch": 0.023084025854108958,
83
+ "grad_norm": 4.868797420697276,
84
+ "learning_rate": 4.608294930875577e-06,
85
+ "loss": 1.0706,
86
+ "step": 50
87
+ },
88
+ {
89
+ "epoch": 0.025392428439519853,
90
+ "grad_norm": 4.866881384682284,
91
+ "learning_rate": 5.0691244239631346e-06,
92
+ "loss": 1.0874,
93
+ "step": 55
94
+ },
95
+ {
96
+ "epoch": 0.027700831024930747,
97
+ "grad_norm": 4.942215298185941,
98
+ "learning_rate": 5.529953917050692e-06,
99
+ "loss": 1.0765,
100
+ "step": 60
101
+ },
102
+ {
103
+ "epoch": 0.030009233610341645,
104
+ "grad_norm": 4.510624758005134,
105
+ "learning_rate": 5.9907834101382485e-06,
106
+ "loss": 1.0651,
107
+ "step": 65
108
+ },
109
+ {
110
+ "epoch": 0.032317636195752536,
111
+ "grad_norm": 4.961160516804359,
112
+ "learning_rate": 6.451612903225806e-06,
113
+ "loss": 1.0826,
114
+ "step": 70
115
+ },
116
+ {
117
+ "epoch": 0.03462603878116344,
118
+ "grad_norm": 4.999318871040395,
119
+ "learning_rate": 6.912442396313365e-06,
120
+ "loss": 1.0966,
121
+ "step": 75
122
+ },
123
+ {
124
+ "epoch": 0.03693444136657433,
125
+ "grad_norm": 4.639315170945839,
126
+ "learning_rate": 7.373271889400923e-06,
127
+ "loss": 1.0934,
128
+ "step": 80
129
+ },
130
+ {
131
+ "epoch": 0.039242843951985226,
132
+ "grad_norm": 4.79620290699333,
133
+ "learning_rate": 7.83410138248848e-06,
134
+ "loss": 1.0932,
135
+ "step": 85
136
+ },
137
+ {
138
+ "epoch": 0.04155124653739612,
139
+ "grad_norm": 4.957993386602933,
140
+ "learning_rate": 8.294930875576038e-06,
141
+ "loss": 1.1032,
142
+ "step": 90
143
+ },
144
+ {
145
+ "epoch": 0.043859649122807015,
146
+ "grad_norm": 4.669607524515842,
147
+ "learning_rate": 8.755760368663595e-06,
148
+ "loss": 1.0875,
149
+ "step": 95
150
+ },
151
+ {
152
+ "epoch": 0.046168051708217916,
153
+ "grad_norm": 4.602861109332021,
154
+ "learning_rate": 9.216589861751153e-06,
155
+ "loss": 1.0809,
156
+ "step": 100
157
+ },
158
+ {
159
+ "epoch": 0.04847645429362881,
160
+ "grad_norm": 4.548726172146522,
161
+ "learning_rate": 9.67741935483871e-06,
162
+ "loss": 1.1218,
163
+ "step": 105
164
+ },
165
+ {
166
+ "epoch": 0.050784856879039705,
167
+ "grad_norm": 4.724149571099335,
168
+ "learning_rate": 1.0138248847926269e-05,
169
+ "loss": 1.1007,
170
+ "step": 110
171
+ },
172
+ {
173
+ "epoch": 0.0530932594644506,
174
+ "grad_norm": 5.309010432204349,
175
+ "learning_rate": 1.0599078341013826e-05,
176
+ "loss": 1.1368,
177
+ "step": 115
178
+ },
179
+ {
180
+ "epoch": 0.055401662049861494,
181
+ "grad_norm": 4.839305137997795,
182
+ "learning_rate": 1.1059907834101385e-05,
183
+ "loss": 1.1055,
184
+ "step": 120
185
+ },
186
+ {
187
+ "epoch": 0.05771006463527239,
188
+ "grad_norm": 4.5796294161615565,
189
+ "learning_rate": 1.152073732718894e-05,
190
+ "loss": 1.1155,
191
+ "step": 125
192
+ },
193
+ {
194
+ "epoch": 0.06001846722068329,
195
+ "grad_norm": 4.706959812240538,
196
+ "learning_rate": 1.1981566820276497e-05,
197
+ "loss": 1.1387,
198
+ "step": 130
199
+ },
200
+ {
201
+ "epoch": 0.062326869806094184,
202
+ "grad_norm": 4.492165983938348,
203
+ "learning_rate": 1.2442396313364056e-05,
204
+ "loss": 1.1733,
205
+ "step": 135
206
+ },
207
+ {
208
+ "epoch": 0.06463527239150507,
209
+ "grad_norm": 4.746032213098828,
210
+ "learning_rate": 1.2903225806451613e-05,
211
+ "loss": 1.1375,
212
+ "step": 140
213
+ },
214
+ {
215
+ "epoch": 0.06694367497691597,
216
+ "grad_norm": 4.713817907356248,
217
+ "learning_rate": 1.3364055299539171e-05,
218
+ "loss": 1.158,
219
+ "step": 145
220
+ },
221
+ {
222
+ "epoch": 0.06925207756232687,
223
+ "grad_norm": 4.342905646572964,
224
+ "learning_rate": 1.382488479262673e-05,
225
+ "loss": 1.1607,
226
+ "step": 150
227
+ },
228
+ {
229
+ "epoch": 0.07156048014773776,
230
+ "grad_norm": 4.502102336400582,
231
+ "learning_rate": 1.4285714285714287e-05,
232
+ "loss": 1.1382,
233
+ "step": 155
234
+ },
235
+ {
236
+ "epoch": 0.07386888273314866,
237
+ "grad_norm": 4.300393542300411,
238
+ "learning_rate": 1.4746543778801846e-05,
239
+ "loss": 1.1518,
240
+ "step": 160
241
+ },
242
+ {
243
+ "epoch": 0.07617728531855955,
244
+ "grad_norm": 4.400546990325483,
245
+ "learning_rate": 1.5207373271889403e-05,
246
+ "loss": 1.1436,
247
+ "step": 165
248
+ },
249
+ {
250
+ "epoch": 0.07848568790397045,
251
+ "grad_norm": 4.77590791643038,
252
+ "learning_rate": 1.566820276497696e-05,
253
+ "loss": 1.2173,
254
+ "step": 170
255
+ },
256
+ {
257
+ "epoch": 0.08079409048938135,
258
+ "grad_norm": 4.32969974114785,
259
+ "learning_rate": 1.6129032258064517e-05,
260
+ "loss": 1.1654,
261
+ "step": 175
262
+ },
263
+ {
264
+ "epoch": 0.08310249307479224,
265
+ "grad_norm": 5.285074448558262,
266
+ "learning_rate": 1.6589861751152075e-05,
267
+ "loss": 1.2185,
268
+ "step": 180
269
+ },
270
+ {
271
+ "epoch": 0.08541089566020314,
272
+ "grad_norm": 6.312179413881035,
273
+ "learning_rate": 1.705069124423963e-05,
274
+ "loss": 1.2063,
275
+ "step": 185
276
+ },
277
+ {
278
+ "epoch": 0.08771929824561403,
279
+ "grad_norm": 4.351482667809684,
280
+ "learning_rate": 1.751152073732719e-05,
281
+ "loss": 1.1814,
282
+ "step": 190
283
+ },
284
+ {
285
+ "epoch": 0.09002770083102493,
286
+ "grad_norm": 4.468079454686115,
287
+ "learning_rate": 1.7972350230414748e-05,
288
+ "loss": 1.2058,
289
+ "step": 195
290
+ },
291
+ {
292
+ "epoch": 0.09233610341643583,
293
+ "grad_norm": 5.51425273025908,
294
+ "learning_rate": 1.8433179723502307e-05,
295
+ "loss": 1.1646,
296
+ "step": 200
297
+ },
298
+ {
299
+ "epoch": 0.09464450600184672,
300
+ "grad_norm": 4.661323669253999,
301
+ "learning_rate": 1.8894009216589862e-05,
302
+ "loss": 1.1689,
303
+ "step": 205
304
+ },
305
+ {
306
+ "epoch": 0.09695290858725762,
307
+ "grad_norm": 726.888849011745,
308
+ "learning_rate": 1.935483870967742e-05,
309
+ "loss": 1.8785,
310
+ "step": 210
311
+ },
312
+ {
313
+ "epoch": 0.09926131117266851,
314
+ "grad_norm": 5.867844131835615,
315
+ "learning_rate": 1.981566820276498e-05,
316
+ "loss": 1.2661,
317
+ "step": 215
318
+ },
319
+ {
320
+ "epoch": 0.10156971375807941,
321
+ "grad_norm": 6.2684199277472015,
322
+ "learning_rate": 1.9999883080288618e-05,
323
+ "loss": 1.2545,
324
+ "step": 220
325
+ },
326
+ {
327
+ "epoch": 0.1038781163434903,
328
+ "grad_norm": 5.426004523317811,
329
+ "learning_rate": 1.999916858084231e-05,
330
+ "loss": 1.2259,
331
+ "step": 225
332
+ },
333
+ {
334
+ "epoch": 0.1061865189289012,
335
+ "grad_norm": 4.617593739028291,
336
+ "learning_rate": 1.999780458369908e-05,
337
+ "loss": 1.177,
338
+ "step": 230
339
+ },
340
+ {
341
+ "epoch": 0.1084949215143121,
342
+ "grad_norm": 4.412649452769939,
343
+ "learning_rate": 1.9995791177457598e-05,
344
+ "loss": 1.2127,
345
+ "step": 235
346
+ },
347
+ {
348
+ "epoch": 0.11080332409972299,
349
+ "grad_norm": 4.3422685444059965,
350
+ "learning_rate": 1.9993128492899012e-05,
351
+ "loss": 1.2398,
352
+ "step": 240
353
+ },
354
+ {
355
+ "epoch": 0.11311172668513389,
356
+ "grad_norm": 4.837187367612426,
357
+ "learning_rate": 1.9989816702978447e-05,
358
+ "loss": 1.2189,
359
+ "step": 245
360
+ },
361
+ {
362
+ "epoch": 0.11542012927054478,
363
+ "grad_norm": 4.203624655101154,
364
+ "learning_rate": 1.998585602281378e-05,
365
+ "loss": 1.1641,
366
+ "step": 250
367
+ },
368
+ {
369
+ "epoch": 0.11772853185595568,
370
+ "grad_norm": 4.172078683953242,
371
+ "learning_rate": 1.9981246709671668e-05,
372
+ "loss": 1.217,
373
+ "step": 255
374
+ },
375
+ {
376
+ "epoch": 0.12003693444136658,
377
+ "grad_norm": 4.445815868978626,
378
+ "learning_rate": 1.9975989062950828e-05,
379
+ "loss": 1.2198,
380
+ "step": 260
381
+ },
382
+ {
383
+ "epoch": 0.12234533702677747,
384
+ "grad_norm": 4.5591861583880045,
385
+ "learning_rate": 1.9970083424162598e-05,
386
+ "loss": 1.2971,
387
+ "step": 265
388
+ },
389
+ {
390
+ "epoch": 0.12465373961218837,
391
+ "grad_norm": 8.794456155689286,
392
+ "learning_rate": 1.9963530176908752e-05,
393
+ "loss": 1.2543,
394
+ "step": 270
395
+ },
396
+ {
397
+ "epoch": 0.12696214219759927,
398
+ "grad_norm": 4.296337355363852,
399
+ "learning_rate": 1.9956329746856583e-05,
400
+ "loss": 1.1902,
401
+ "step": 275
402
+ },
403
+ {
404
+ "epoch": 0.12927054478301014,
405
+ "grad_norm": 4.210037276606183,
406
+ "learning_rate": 1.9948482601711245e-05,
407
+ "loss": 1.2119,
408
+ "step": 280
409
+ },
410
+ {
411
+ "epoch": 0.13157894736842105,
412
+ "grad_norm": 4.525819047133829,
413
+ "learning_rate": 1.9939989251185386e-05,
414
+ "loss": 1.2267,
415
+ "step": 285
416
+ },
417
+ {
418
+ "epoch": 0.13388734995383195,
419
+ "grad_norm": 4.617114491344834,
420
+ "learning_rate": 1.993085024696604e-05,
421
+ "loss": 1.253,
422
+ "step": 290
423
+ },
424
+ {
425
+ "epoch": 0.13619575253924285,
426
+ "grad_norm": 4.306884827311129,
427
+ "learning_rate": 1.992106618267878e-05,
428
+ "loss": 1.2968,
429
+ "step": 295
430
+ },
431
+ {
432
+ "epoch": 0.13850415512465375,
433
+ "grad_norm": 4.06156778374873,
434
+ "learning_rate": 1.9910637693849166e-05,
435
+ "loss": 1.2523,
436
+ "step": 300
437
+ },
438
+ {
439
+ "epoch": 0.14081255771006462,
440
+ "grad_norm": 4.1725194301873225,
441
+ "learning_rate": 1.9899565457861463e-05,
442
+ "loss": 1.2465,
443
+ "step": 305
444
+ },
445
+ {
446
+ "epoch": 0.14312096029547552,
447
+ "grad_norm": 6.1582094432239005,
448
+ "learning_rate": 1.988785019391465e-05,
449
+ "loss": 1.2893,
450
+ "step": 310
451
+ },
452
+ {
453
+ "epoch": 0.14542936288088643,
454
+ "grad_norm": 4.468419480755536,
455
+ "learning_rate": 1.987549266297568e-05,
456
+ "loss": 1.2684,
457
+ "step": 315
458
+ },
459
+ {
460
+ "epoch": 0.14773776546629733,
461
+ "grad_norm": 4.663314719431853,
462
+ "learning_rate": 1.986249366773009e-05,
463
+ "loss": 1.2472,
464
+ "step": 320
465
+ },
466
+ {
467
+ "epoch": 0.15004616805170823,
468
+ "grad_norm": 4.557295583444763,
469
+ "learning_rate": 1.9848854052529822e-05,
470
+ "loss": 1.2856,
471
+ "step": 325
472
+ },
473
+ {
474
+ "epoch": 0.1523545706371191,
475
+ "grad_norm": 4.128322557091226,
476
+ "learning_rate": 1.9834574703338406e-05,
477
+ "loss": 1.2717,
478
+ "step": 330
479
+ },
480
+ {
481
+ "epoch": 0.15466297322253,
482
+ "grad_norm": 4.265562249971871,
483
+ "learning_rate": 1.9819656547673393e-05,
484
+ "loss": 1.2614,
485
+ "step": 335
486
+ },
487
+ {
488
+ "epoch": 0.1569713758079409,
489
+ "grad_norm": 4.189648461283852,
490
+ "learning_rate": 1.9804100554546127e-05,
491
+ "loss": 1.2221,
492
+ "step": 340
493
+ },
494
+ {
495
+ "epoch": 0.1592797783933518,
496
+ "grad_norm": 4.753781028146877,
497
+ "learning_rate": 1.9787907734398785e-05,
498
+ "loss": 1.2641,
499
+ "step": 345
500
+ },
501
+ {
502
+ "epoch": 0.1615881809787627,
503
+ "grad_norm": 4.624571477954883,
504
+ "learning_rate": 1.9771079139038765e-05,
505
+ "loss": 1.3082,
506
+ "step": 350
507
+ },
508
+ {
509
+ "epoch": 0.16389658356417358,
510
+ "grad_norm": 4.51192227446534,
511
+ "learning_rate": 1.9753615861570338e-05,
512
+ "loss": 1.3116,
513
+ "step": 355
514
+ },
515
+ {
516
+ "epoch": 0.16620498614958448,
517
+ "grad_norm": 4.392313959090253,
518
+ "learning_rate": 1.9735519036323656e-05,
519
+ "loss": 1.2304,
520
+ "step": 360
521
+ },
522
+ {
523
+ "epoch": 0.16851338873499538,
524
+ "grad_norm": 4.979240339208881,
525
+ "learning_rate": 1.9716789838781095e-05,
526
+ "loss": 1.2682,
527
+ "step": 365
528
+ },
529
+ {
530
+ "epoch": 0.17082179132040629,
531
+ "grad_norm": 4.96937836441046,
532
+ "learning_rate": 1.9697429485500862e-05,
533
+ "loss": 1.3054,
534
+ "step": 370
535
+ },
536
+ {
537
+ "epoch": 0.1731301939058172,
538
+ "grad_norm": 3.935739346153204,
539
+ "learning_rate": 1.9677439234038004e-05,
540
+ "loss": 1.2704,
541
+ "step": 375
542
+ },
543
+ {
544
+ "epoch": 0.17543859649122806,
545
+ "grad_norm": 4.366123456450803,
546
+ "learning_rate": 1.96568203828627e-05,
547
+ "loss": 1.236,
548
+ "step": 380
549
+ },
550
+ {
551
+ "epoch": 0.17774699907663896,
552
+ "grad_norm": 4.003638705307624,
553
+ "learning_rate": 1.963557427127594e-05,
554
+ "loss": 1.2134,
555
+ "step": 385
556
+ },
557
+ {
558
+ "epoch": 0.18005540166204986,
559
+ "grad_norm": 4.711836278485082,
560
+ "learning_rate": 1.9613702279322518e-05,
561
+ "loss": 1.2424,
562
+ "step": 390
563
+ },
564
+ {
565
+ "epoch": 0.18236380424746076,
566
+ "grad_norm": 4.7756346414851345,
567
+ "learning_rate": 1.95912058277014e-05,
568
+ "loss": 1.2513,
569
+ "step": 395
570
+ },
571
+ {
572
+ "epoch": 0.18467220683287167,
573
+ "grad_norm": 4.055556447653374,
574
+ "learning_rate": 1.9568086377673422e-05,
575
+ "loss": 1.2305,
576
+ "step": 400
577
+ },
578
+ {
579
+ "epoch": 0.18698060941828254,
580
+ "grad_norm": 3.9870929086001605,
581
+ "learning_rate": 1.9544345430966398e-05,
582
+ "loss": 1.2766,
583
+ "step": 405
584
+ },
585
+ {
586
+ "epoch": 0.18928901200369344,
587
+ "grad_norm": 4.3683569271591525,
588
+ "learning_rate": 1.951998452967756e-05,
589
+ "loss": 1.2701,
590
+ "step": 410
591
+ },
592
+ {
593
+ "epoch": 0.19159741458910434,
594
+ "grad_norm": 4.282177503327308,
595
+ "learning_rate": 1.9495005256173398e-05,
596
+ "loss": 1.2173,
597
+ "step": 415
598
+ },
599
+ {
600
+ "epoch": 0.19390581717451524,
601
+ "grad_norm": 4.122228465513596,
602
+ "learning_rate": 1.9469409232986876e-05,
603
+ "loss": 1.293,
604
+ "step": 420
605
+ },
606
+ {
607
+ "epoch": 0.19621421975992612,
608
+ "grad_norm": 4.391730062186428,
609
+ "learning_rate": 1.9443198122712036e-05,
610
+ "loss": 1.3013,
611
+ "step": 425
612
+ },
613
+ {
614
+ "epoch": 0.19852262234533702,
615
+ "grad_norm": 4.2533205751093,
616
+ "learning_rate": 1.9416373627896002e-05,
617
+ "loss": 1.2478,
618
+ "step": 430
619
+ },
620
+ {
621
+ "epoch": 0.20083102493074792,
622
+ "grad_norm": 4.982151398275928,
623
+ "learning_rate": 1.9388937490928402e-05,
624
+ "loss": 1.289,
625
+ "step": 435
626
+ },
627
+ {
628
+ "epoch": 0.20313942751615882,
629
+ "grad_norm": 4.254393940238592,
630
+ "learning_rate": 1.9360891493928186e-05,
631
+ "loss": 1.2773,
632
+ "step": 440
633
+ },
634
+ {
635
+ "epoch": 0.20544783010156972,
636
+ "grad_norm": 4.812233488623846,
637
+ "learning_rate": 1.933223745862786e-05,
638
+ "loss": 1.2571,
639
+ "step": 445
640
+ },
641
+ {
642
+ "epoch": 0.2077562326869806,
643
+ "grad_norm": 4.193819364681046,
644
+ "learning_rate": 1.930297724625516e-05,
645
+ "loss": 1.3167,
646
+ "step": 450
647
+ },
648
+ {
649
+ "epoch": 0.2100646352723915,
650
+ "grad_norm": 4.318967687699199,
651
+ "learning_rate": 1.9273112757412165e-05,
652
+ "loss": 1.2578,
653
+ "step": 455
654
+ },
655
+ {
656
+ "epoch": 0.2123730378578024,
657
+ "grad_norm": 4.021438837096732,
658
+ "learning_rate": 1.9242645931951833e-05,
659
+ "loss": 1.2703,
660
+ "step": 460
661
+ },
662
+ {
663
+ "epoch": 0.2146814404432133,
664
+ "grad_norm": 3.9988355301981344,
665
+ "learning_rate": 1.921157874885199e-05,
666
+ "loss": 1.2702,
667
+ "step": 465
668
+ },
669
+ {
670
+ "epoch": 0.2169898430286242,
671
+ "grad_norm": 3.866018897785007,
672
+ "learning_rate": 1.91799132260868e-05,
673
+ "loss": 1.2651,
674
+ "step": 470
675
+ },
676
+ {
677
+ "epoch": 0.21929824561403508,
678
+ "grad_norm": 4.228145732575894,
679
+ "learning_rate": 1.9147651420495696e-05,
680
+ "loss": 1.2429,
681
+ "step": 475
682
+ },
683
+ {
684
+ "epoch": 0.22160664819944598,
685
+ "grad_norm": 4.16044625111994,
686
+ "learning_rate": 1.9114795427649735e-05,
687
+ "loss": 1.2263,
688
+ "step": 480
689
+ },
690
+ {
691
+ "epoch": 0.22391505078485688,
692
+ "grad_norm": 3.7071606709047678,
693
+ "learning_rate": 1.9081347381715535e-05,
694
+ "loss": 1.2592,
695
+ "step": 485
696
+ },
697
+ {
698
+ "epoch": 0.22622345337026778,
699
+ "grad_norm": 4.093983584879632,
700
+ "learning_rate": 1.904730945531661e-05,
701
+ "loss": 1.2819,
702
+ "step": 490
703
+ },
704
+ {
705
+ "epoch": 0.22853185595567868,
706
+ "grad_norm": 4.247421291613911,
707
+ "learning_rate": 1.901268385939226e-05,
708
+ "loss": 1.3118,
709
+ "step": 495
710
+ },
711
+ {
712
+ "epoch": 0.23084025854108955,
713
+ "grad_norm": 4.088704419142061,
714
+ "learning_rate": 1.8977472843053962e-05,
715
+ "loss": 1.2529,
716
+ "step": 500
717
+ },
718
+ {
719
+ "epoch": 0.23314866112650046,
720
+ "grad_norm": 3.9526614218286698,
721
+ "learning_rate": 1.8941678693439272e-05,
722
+ "loss": 1.2254,
723
+ "step": 505
724
+ },
725
+ {
726
+ "epoch": 0.23545706371191136,
727
+ "grad_norm": 3.767319095108075,
728
+ "learning_rate": 1.8905303735563274e-05,
729
+ "loss": 1.2705,
730
+ "step": 510
731
+ },
732
+ {
733
+ "epoch": 0.23776546629732226,
734
+ "grad_norm": 4.1464464034097,
735
+ "learning_rate": 1.886835033216755e-05,
736
+ "loss": 1.2841,
737
+ "step": 515
738
+ },
739
+ {
740
+ "epoch": 0.24007386888273316,
741
+ "grad_norm": 4.154511161776497,
742
+ "learning_rate": 1.88308208835667e-05,
743
+ "loss": 1.2715,
744
+ "step": 520
745
+ },
746
+ {
747
+ "epoch": 0.24238227146814403,
748
+ "grad_norm": 4.815166096996458,
749
+ "learning_rate": 1.8792717827492446e-05,
750
+ "loss": 1.3034,
751
+ "step": 525
752
+ },
753
+ {
754
+ "epoch": 0.24469067405355494,
755
+ "grad_norm": 22.245546847367528,
756
+ "learning_rate": 1.8754043638935283e-05,
757
+ "loss": 1.2532,
758
+ "step": 530
759
+ },
760
+ {
761
+ "epoch": 0.24699907663896584,
762
+ "grad_norm": 4.177323522295811,
763
+ "learning_rate": 1.871480082998371e-05,
764
+ "loss": 1.2501,
765
+ "step": 535
766
+ },
767
+ {
768
+ "epoch": 0.24930747922437674,
769
+ "grad_norm": 3.9426463777773346,
770
+ "learning_rate": 1.867499194966106e-05,
771
+ "loss": 1.2683,
772
+ "step": 540
773
+ },
774
+ {
775
+ "epoch": 0.2516158818097876,
776
+ "grad_norm": 3.912690873331932,
777
+ "learning_rate": 1.8634619583759933e-05,
778
+ "loss": 1.2874,
779
+ "step": 545
780
+ },
781
+ {
782
+ "epoch": 0.25392428439519854,
783
+ "grad_norm": 3.972529239438344,
784
+ "learning_rate": 1.8593686354674223e-05,
785
+ "loss": 1.2698,
786
+ "step": 550
787
+ },
788
+ {
789
+ "epoch": 0.2562326869806094,
790
+ "grad_norm": 3.958572886167977,
791
+ "learning_rate": 1.8552194921228793e-05,
792
+ "loss": 1.2293,
793
+ "step": 555
794
+ },
795
+ {
796
+ "epoch": 0.2585410895660203,
797
+ "grad_norm": 3.7553829117034767,
798
+ "learning_rate": 1.851014797850676e-05,
799
+ "loss": 1.2818,
800
+ "step": 560
801
+ },
802
+ {
803
+ "epoch": 0.2608494921514312,
804
+ "grad_norm": 4.352268879736511,
805
+ "learning_rate": 1.8467548257674453e-05,
806
+ "loss": 1.2552,
807
+ "step": 565
808
+ },
809
+ {
810
+ "epoch": 0.2631578947368421,
811
+ "grad_norm": 5.014139215739045,
812
+ "learning_rate": 1.8424398525803983e-05,
813
+ "loss": 1.2228,
814
+ "step": 570
815
+ },
816
+ {
817
+ "epoch": 0.265466297322253,
818
+ "grad_norm": 4.192590762422093,
819
+ "learning_rate": 1.8380701585693526e-05,
820
+ "loss": 1.2526,
821
+ "step": 575
822
+ },
823
+ {
824
+ "epoch": 0.2677746999076639,
825
+ "grad_norm": 4.209340122955672,
826
+ "learning_rate": 1.8336460275685267e-05,
827
+ "loss": 1.2681,
828
+ "step": 580
829
+ },
830
+ {
831
+ "epoch": 0.27008310249307477,
832
+ "grad_norm": 3.801129619164067,
833
+ "learning_rate": 1.8291677469481025e-05,
834
+ "loss": 1.2623,
835
+ "step": 585
836
+ },
837
+ {
838
+ "epoch": 0.2723915050784857,
839
+ "grad_norm": 5.60448449703679,
840
+ "learning_rate": 1.8246356075955594e-05,
841
+ "loss": 1.2778,
842
+ "step": 590
843
+ },
844
+ {
845
+ "epoch": 0.27469990766389657,
846
+ "grad_norm": 3.8415685450636143,
847
+ "learning_rate": 1.820049903896782e-05,
848
+ "loss": 1.2546,
849
+ "step": 595
850
+ },
851
+ {
852
+ "epoch": 0.2770083102493075,
853
+ "grad_norm": 3.766423848242755,
854
+ "learning_rate": 1.8154109337169326e-05,
855
+ "loss": 1.2994,
856
+ "step": 600
857
+ },
858
+ {
859
+ "epoch": 0.2793167128347184,
860
+ "grad_norm": 3.8445299977202363,
861
+ "learning_rate": 1.8107189983811094e-05,
862
+ "loss": 1.2779,
863
+ "step": 605
864
+ },
865
+ {
866
+ "epoch": 0.28162511542012925,
867
+ "grad_norm": 4.20182793655244,
868
+ "learning_rate": 1.8059744026547713e-05,
869
+ "loss": 1.2794,
870
+ "step": 610
871
+ },
872
+ {
873
+ "epoch": 0.2839335180055402,
874
+ "grad_norm": 3.6927184982852554,
875
+ "learning_rate": 1.8011774547239403e-05,
876
+ "loss": 1.2217,
877
+ "step": 615
878
+ },
879
+ {
880
+ "epoch": 0.28624192059095105,
881
+ "grad_norm": 3.906241578603264,
882
+ "learning_rate": 1.796328466175186e-05,
883
+ "loss": 1.3162,
884
+ "step": 620
885
+ },
886
+ {
887
+ "epoch": 0.288550323176362,
888
+ "grad_norm": 3.7221850266429675,
889
+ "learning_rate": 1.791427751975385e-05,
890
+ "loss": 1.2591,
891
+ "step": 625
892
+ },
893
+ {
894
+ "epoch": 0.29085872576177285,
895
+ "grad_norm": 4.11815775927983,
896
+ "learning_rate": 1.786475630451262e-05,
897
+ "loss": 1.2572,
898
+ "step": 630
899
+ },
900
+ {
901
+ "epoch": 0.2931671283471837,
902
+ "grad_norm": 3.8995508626898454,
903
+ "learning_rate": 1.781472423268713e-05,
904
+ "loss": 1.2604,
905
+ "step": 635
906
+ },
907
+ {
908
+ "epoch": 0.29547553093259465,
909
+ "grad_norm": 4.5219499712986035,
910
+ "learning_rate": 1.776418455411913e-05,
911
+ "loss": 1.298,
912
+ "step": 640
913
+ },
914
+ {
915
+ "epoch": 0.29778393351800553,
916
+ "grad_norm": 4.5899598168207785,
917
+ "learning_rate": 1.7713140551622032e-05,
918
+ "loss": 1.2664,
919
+ "step": 645
920
+ },
921
+ {
922
+ "epoch": 0.30009233610341646,
923
+ "grad_norm": 4.641570078800192,
924
+ "learning_rate": 1.7661595540767714e-05,
925
+ "loss": 1.2689,
926
+ "step": 650
927
+ },
928
+ {
929
+ "epoch": 0.30240073868882733,
930
+ "grad_norm": 4.383087991217795,
931
+ "learning_rate": 1.7609552869671126e-05,
932
+ "loss": 1.2551,
933
+ "step": 655
934
+ },
935
+ {
936
+ "epoch": 0.3047091412742382,
937
+ "grad_norm": 3.9687899547292576,
938
+ "learning_rate": 1.7557015918772822e-05,
939
+ "loss": 1.2379,
940
+ "step": 660
941
+ },
942
+ {
943
+ "epoch": 0.30701754385964913,
944
+ "grad_norm": 4.133840300932013,
945
+ "learning_rate": 1.750398810061939e-05,
946
+ "loss": 1.2779,
947
+ "step": 665
948
+ },
949
+ {
950
+ "epoch": 0.30932594644506,
951
+ "grad_norm": 3.84778329275165,
952
+ "learning_rate": 1.745047285964179e-05,
953
+ "loss": 1.2306,
954
+ "step": 670
955
+ },
956
+ {
957
+ "epoch": 0.31163434903047094,
958
+ "grad_norm": 4.054603771464119,
959
+ "learning_rate": 1.7396473671931597e-05,
960
+ "loss": 1.2089,
961
+ "step": 675
962
+ },
963
+ {
964
+ "epoch": 0.3139427516158818,
965
+ "grad_norm": 4.013882196193361,
966
+ "learning_rate": 1.7341994045015245e-05,
967
+ "loss": 1.2225,
968
+ "step": 680
969
+ },
970
+ {
971
+ "epoch": 0.3162511542012927,
972
+ "grad_norm": 4.076399340438248,
973
+ "learning_rate": 1.7287037517626174e-05,
974
+ "loss": 1.3166,
975
+ "step": 685
976
+ },
977
+ {
978
+ "epoch": 0.3185595567867036,
979
+ "grad_norm": 3.991144267549364,
980
+ "learning_rate": 1.7231607659474972e-05,
981
+ "loss": 1.2706,
982
+ "step": 690
983
+ },
984
+ {
985
+ "epoch": 0.3208679593721145,
986
+ "grad_norm": 3.592102167186549,
987
+ "learning_rate": 1.7175708071017503e-05,
988
+ "loss": 1.2066,
989
+ "step": 695
990
+ },
991
+ {
992
+ "epoch": 0.3231763619575254,
993
+ "grad_norm": 4.2490266329322655,
994
+ "learning_rate": 1.7119342383221055e-05,
995
+ "loss": 1.3011,
996
+ "step": 700
997
+ },
998
+ {
999
+ "epoch": 0.3254847645429363,
1000
+ "grad_norm": 3.7487591296204266,
1001
+ "learning_rate": 1.7062514257328474e-05,
1002
+ "loss": 1.2587,
1003
+ "step": 705
1004
+ },
1005
+ {
1006
+ "epoch": 0.32779316712834716,
1007
+ "grad_norm": 3.6111287365523466,
1008
+ "learning_rate": 1.7005227384620336e-05,
1009
+ "loss": 1.2626,
1010
+ "step": 710
1011
+ },
1012
+ {
1013
+ "epoch": 0.3301015697137581,
1014
+ "grad_norm": 3.8624035554609892,
1015
+ "learning_rate": 1.6947485486175223e-05,
1016
+ "loss": 1.266,
1017
+ "step": 715
1018
+ },
1019
+ {
1020
+ "epoch": 0.33240997229916897,
1021
+ "grad_norm": 4.191574332500623,
1022
+ "learning_rate": 1.688929231262797e-05,
1023
+ "loss": 1.2275,
1024
+ "step": 720
1025
+ },
1026
+ {
1027
+ "epoch": 0.3347183748845799,
1028
+ "grad_norm": 3.931766819485826,
1029
+ "learning_rate": 1.683065164392606e-05,
1030
+ "loss": 1.2525,
1031
+ "step": 725
1032
+ },
1033
+ {
1034
+ "epoch": 0.33702677746999077,
1035
+ "grad_norm": 3.8224846577065685,
1036
+ "learning_rate": 1.6771567289084122e-05,
1037
+ "loss": 1.228,
1038
+ "step": 730
1039
+ },
1040
+ {
1041
+ "epoch": 0.33933518005540164,
1042
+ "grad_norm": 3.7975499971303024,
1043
+ "learning_rate": 1.6712043085936473e-05,
1044
+ "loss": 1.2121,
1045
+ "step": 735
1046
+ },
1047
+ {
1048
+ "epoch": 0.34164358264081257,
1049
+ "grad_norm": 3.7233983105114326,
1050
+ "learning_rate": 1.6652082900887858e-05,
1051
+ "loss": 1.2439,
1052
+ "step": 740
1053
+ },
1054
+ {
1055
+ "epoch": 0.34395198522622344,
1056
+ "grad_norm": 4.0496534376278674,
1057
+ "learning_rate": 1.6591690628662305e-05,
1058
+ "loss": 1.3064,
1059
+ "step": 745
1060
+ },
1061
+ {
1062
+ "epoch": 0.3462603878116344,
1063
+ "grad_norm": 4.397682055950332,
1064
+ "learning_rate": 1.6530870192050134e-05,
1065
+ "loss": 1.2433,
1066
+ "step": 750
1067
+ },
1068
+ {
1069
+ "epoch": 0.34856879039704525,
1070
+ "grad_norm": 3.999160650641557,
1071
+ "learning_rate": 1.6469625541653152e-05,
1072
+ "loss": 1.2117,
1073
+ "step": 755
1074
+ },
1075
+ {
1076
+ "epoch": 0.3508771929824561,
1077
+ "grad_norm": 4.475385002364299,
1078
+ "learning_rate": 1.6407960655628055e-05,
1079
+ "loss": 1.203,
1080
+ "step": 760
1081
+ },
1082
+ {
1083
+ "epoch": 0.35318559556786705,
1084
+ "grad_norm": 3.5042875341184416,
1085
+ "learning_rate": 1.6345879539428e-05,
1086
+ "loss": 1.2567,
1087
+ "step": 765
1088
+ },
1089
+ {
1090
+ "epoch": 0.3554939981532779,
1091
+ "grad_norm": 3.678612416780679,
1092
+ "learning_rate": 1.6283386225542467e-05,
1093
+ "loss": 1.2276,
1094
+ "step": 770
1095
+ },
1096
+ {
1097
+ "epoch": 0.35780240073868885,
1098
+ "grad_norm": 5.063348081613382,
1099
+ "learning_rate": 1.622048477323529e-05,
1100
+ "loss": 1.2297,
1101
+ "step": 775
1102
+ },
1103
+ {
1104
+ "epoch": 0.3601108033240997,
1105
+ "grad_norm": 4.04397764374825,
1106
+ "learning_rate": 1.6157179268281007e-05,
1107
+ "loss": 1.2498,
1108
+ "step": 780
1109
+ },
1110
+ {
1111
+ "epoch": 0.3624192059095106,
1112
+ "grad_norm": 3.7786600086660553,
1113
+ "learning_rate": 1.6093473822699467e-05,
1114
+ "loss": 1.2156,
1115
+ "step": 785
1116
+ },
1117
+ {
1118
+ "epoch": 0.36472760849492153,
1119
+ "grad_norm": 3.726670143436363,
1120
+ "learning_rate": 1.6029372574488732e-05,
1121
+ "loss": 1.248,
1122
+ "step": 790
1123
+ },
1124
+ {
1125
+ "epoch": 0.3670360110803324,
1126
+ "grad_norm": 3.6023664901819115,
1127
+ "learning_rate": 1.5964879687356286e-05,
1128
+ "loss": 1.2762,
1129
+ "step": 795
1130
+ },
1131
+ {
1132
+ "epoch": 0.36934441366574333,
1133
+ "grad_norm": 3.684618843127009,
1134
+ "learning_rate": 1.589999935044859e-05,
1135
+ "loss": 1.2269,
1136
+ "step": 800
1137
+ },
1138
+ {
1139
+ "epoch": 0.3716528162511542,
1140
+ "grad_norm": 3.6119834291134465,
1141
+ "learning_rate": 1.5834735778078968e-05,
1142
+ "loss": 1.2078,
1143
+ "step": 805
1144
+ },
1145
+ {
1146
+ "epoch": 0.3739612188365651,
1147
+ "grad_norm": 3.66332363718426,
1148
+ "learning_rate": 1.5769093209453876e-05,
1149
+ "loss": 1.2713,
1150
+ "step": 810
1151
+ },
1152
+ {
1153
+ "epoch": 0.376269621421976,
1154
+ "grad_norm": 4.137676249046753,
1155
+ "learning_rate": 1.5703075908397523e-05,
1156
+ "loss": 1.2816,
1157
+ "step": 815
1158
+ },
1159
+ {
1160
+ "epoch": 0.3785780240073869,
1161
+ "grad_norm": 3.8481468093108475,
1162
+ "learning_rate": 1.563668816307494e-05,
1163
+ "loss": 1.2203,
1164
+ "step": 820
1165
+ },
1166
+ {
1167
+ "epoch": 0.3808864265927978,
1168
+ "grad_norm": 3.7158307301305156,
1169
+ "learning_rate": 1.556993428571342e-05,
1170
+ "loss": 1.2163,
1171
+ "step": 825
1172
+ },
1173
+ {
1174
+ "epoch": 0.3831948291782087,
1175
+ "grad_norm": 3.851222452502614,
1176
+ "learning_rate": 1.550281861232243e-05,
1177
+ "loss": 1.243,
1178
+ "step": 830
1179
+ },
1180
+ {
1181
+ "epoch": 0.38550323176361956,
1182
+ "grad_norm": 3.6817891692377978,
1183
+ "learning_rate": 1.5435345502411956e-05,
1184
+ "loss": 1.2821,
1185
+ "step": 835
1186
+ },
1187
+ {
1188
+ "epoch": 0.3878116343490305,
1189
+ "grad_norm": 3.9683025462284998,
1190
+ "learning_rate": 1.536751933870934e-05,
1191
+ "loss": 1.2019,
1192
+ "step": 840
1193
+ },
1194
+ {
1195
+ "epoch": 0.39012003693444136,
1196
+ "grad_norm": 3.94265762295689,
1197
+ "learning_rate": 1.5299344526874576e-05,
1198
+ "loss": 1.2774,
1199
+ "step": 845
1200
+ },
1201
+ {
1202
+ "epoch": 0.39242843951985223,
1203
+ "grad_norm": 4.123641725136207,
1204
+ "learning_rate": 1.5230825495214184e-05,
1205
+ "loss": 1.2352,
1206
+ "step": 850
1207
+ },
1208
+ {
1209
+ "epoch": 0.39473684210526316,
1210
+ "grad_norm": 3.9570109790957653,
1211
+ "learning_rate": 1.5161966694393516e-05,
1212
+ "loss": 1.215,
1213
+ "step": 855
1214
+ },
1215
+ {
1216
+ "epoch": 0.39704524469067404,
1217
+ "grad_norm": 3.6427091867450714,
1218
+ "learning_rate": 1.5092772597147707e-05,
1219
+ "loss": 1.2202,
1220
+ "step": 860
1221
+ },
1222
+ {
1223
+ "epoch": 0.39935364727608497,
1224
+ "grad_norm": 3.8425754107191796,
1225
+ "learning_rate": 1.5023247697991114e-05,
1226
+ "loss": 1.2432,
1227
+ "step": 865
1228
+ },
1229
+ {
1230
+ "epoch": 0.40166204986149584,
1231
+ "grad_norm": 3.759319372367797,
1232
+ "learning_rate": 1.4953396512925398e-05,
1233
+ "loss": 1.1838,
1234
+ "step": 870
1235
+ },
1236
+ {
1237
+ "epoch": 0.4039704524469067,
1238
+ "grad_norm": 3.872324982369786,
1239
+ "learning_rate": 1.4883223579146167e-05,
1240
+ "loss": 1.2331,
1241
+ "step": 875
1242
+ },
1243
+ {
1244
+ "epoch": 0.40627885503231764,
1245
+ "grad_norm": 3.8616658245003435,
1246
+ "learning_rate": 1.4812733454748283e-05,
1247
+ "loss": 1.2277,
1248
+ "step": 880
1249
+ },
1250
+ {
1251
+ "epoch": 0.4085872576177285,
1252
+ "grad_norm": 3.5624714154298163,
1253
+ "learning_rate": 1.4741930718429772e-05,
1254
+ "loss": 1.2051,
1255
+ "step": 885
1256
+ },
1257
+ {
1258
+ "epoch": 0.41089566020313945,
1259
+ "grad_norm": 3.6961173549363924,
1260
+ "learning_rate": 1.4670819969194416e-05,
1261
+ "loss": 1.2309,
1262
+ "step": 890
1263
+ },
1264
+ {
1265
+ "epoch": 0.4132040627885503,
1266
+ "grad_norm": 3.5654510220296847,
1267
+ "learning_rate": 1.4599405826053039e-05,
1268
+ "loss": 1.1884,
1269
+ "step": 895
1270
+ },
1271
+ {
1272
+ "epoch": 0.4155124653739612,
1273
+ "grad_norm": 4.205884899208378,
1274
+ "learning_rate": 1.4527692927723465e-05,
1275
+ "loss": 1.2223,
1276
+ "step": 900
1277
+ },
1278
+ {
1279
+ "epoch": 0.4178208679593721,
1280
+ "grad_norm": 3.9431786244545997,
1281
+ "learning_rate": 1.4455685932329204e-05,
1282
+ "loss": 1.2389,
1283
+ "step": 905
1284
+ },
1285
+ {
1286
+ "epoch": 0.420129270544783,
1287
+ "grad_norm": 3.579703652121505,
1288
+ "learning_rate": 1.4383389517096899e-05,
1289
+ "loss": 1.2429,
1290
+ "step": 910
1291
+ },
1292
+ {
1293
+ "epoch": 0.4224376731301939,
1294
+ "grad_norm": 3.7807582830713105,
1295
+ "learning_rate": 1.4310808378052506e-05,
1296
+ "loss": 1.1874,
1297
+ "step": 915
1298
+ },
1299
+ {
1300
+ "epoch": 0.4247460757156048,
1301
+ "grad_norm": 3.9020463886513914,
1302
+ "learning_rate": 1.4237947229716262e-05,
1303
+ "loss": 1.2587,
1304
+ "step": 920
1305
+ },
1306
+ {
1307
+ "epoch": 0.42705447830101567,
1308
+ "grad_norm": 3.7663448915088633,
1309
+ "learning_rate": 1.4164810804796464e-05,
1310
+ "loss": 1.184,
1311
+ "step": 925
1312
+ },
1313
+ {
1314
+ "epoch": 0.4293628808864266,
1315
+ "grad_norm": 3.7907471270783937,
1316
+ "learning_rate": 1.409140385388203e-05,
1317
+ "loss": 1.2445,
1318
+ "step": 930
1319
+ },
1320
+ {
1321
+ "epoch": 0.4316712834718375,
1322
+ "grad_norm": 3.791543245723202,
1323
+ "learning_rate": 1.4017731145133955e-05,
1324
+ "loss": 1.2527,
1325
+ "step": 935
1326
+ },
1327
+ {
1328
+ "epoch": 0.4339796860572484,
1329
+ "grad_norm": 3.8566751713668666,
1330
+ "learning_rate": 1.3943797463975575e-05,
1331
+ "loss": 1.2048,
1332
+ "step": 940
1333
+ },
1334
+ {
1335
+ "epoch": 0.4362880886426593,
1336
+ "grad_norm": 3.943257567360323,
1337
+ "learning_rate": 1.3869607612781733e-05,
1338
+ "loss": 1.2773,
1339
+ "step": 945
1340
+ },
1341
+ {
1342
+ "epoch": 0.43859649122807015,
1343
+ "grad_norm": 3.53206021655625,
1344
+ "learning_rate": 1.3795166410566834e-05,
1345
+ "loss": 1.2066,
1346
+ "step": 950
1347
+ },
1348
+ {
1349
+ "epoch": 0.4409048938134811,
1350
+ "grad_norm": 3.8322607840339504,
1351
+ "learning_rate": 1.372047869267184e-05,
1352
+ "loss": 1.2104,
1353
+ "step": 955
1354
+ },
1355
+ {
1356
+ "epoch": 0.44321329639889195,
1357
+ "grad_norm": 4.982802180271467,
1358
+ "learning_rate": 1.364554931045018e-05,
1359
+ "loss": 1.2782,
1360
+ "step": 960
1361
+ },
1362
+ {
1363
+ "epoch": 0.4455216989843029,
1364
+ "grad_norm": 4.121927772157904,
1365
+ "learning_rate": 1.3570383130952627e-05,
1366
+ "loss": 1.2221,
1367
+ "step": 965
1368
+ },
1369
+ {
1370
+ "epoch": 0.44783010156971376,
1371
+ "grad_norm": 3.5401426054616674,
1372
+ "learning_rate": 1.349498503661116e-05,
1373
+ "loss": 1.249,
1374
+ "step": 970
1375
+ },
1376
+ {
1377
+ "epoch": 0.45013850415512463,
1378
+ "grad_norm": 3.8347876039826647,
1379
+ "learning_rate": 1.3419359924921833e-05,
1380
+ "loss": 1.2736,
1381
+ "step": 975
1382
+ },
1383
+ {
1384
+ "epoch": 0.45244690674053556,
1385
+ "grad_norm": 4.86416192250325,
1386
+ "learning_rate": 1.3343512708126642e-05,
1387
+ "loss": 1.2032,
1388
+ "step": 980
1389
+ },
1390
+ {
1391
+ "epoch": 0.45475530932594643,
1392
+ "grad_norm": 3.8508803970513004,
1393
+ "learning_rate": 1.326744831289447e-05,
1394
+ "loss": 1.2465,
1395
+ "step": 985
1396
+ },
1397
+ {
1398
+ "epoch": 0.45706371191135736,
1399
+ "grad_norm": 3.276661833625774,
1400
+ "learning_rate": 1.3191171680001048e-05,
1401
+ "loss": 1.1905,
1402
+ "step": 990
1403
+ },
1404
+ {
1405
+ "epoch": 0.45937211449676824,
1406
+ "grad_norm": 3.6488550777243933,
1407
+ "learning_rate": 1.3114687764008048e-05,
1408
+ "loss": 1.1991,
1409
+ "step": 995
1410
+ },
1411
+ {
1412
+ "epoch": 0.4616805170821791,
1413
+ "grad_norm": 3.9637997706000223,
1414
+ "learning_rate": 1.3038001532941249e-05,
1415
+ "loss": 1.1994,
1416
+ "step": 1000
1417
+ },
1418
+ {
1419
+ "epoch": 0.46398891966759004,
1420
+ "grad_norm": 3.7798295608326447,
1421
+ "learning_rate": 1.2961117967967844e-05,
1422
+ "loss": 1.2327,
1423
+ "step": 1005
1424
+ },
1425
+ {
1426
+ "epoch": 0.4662973222530009,
1427
+ "grad_norm": 3.742363753899004,
1428
+ "learning_rate": 1.2884042063072881e-05,
1429
+ "loss": 1.2415,
1430
+ "step": 1010
1431
+ },
1432
+ {
1433
+ "epoch": 0.46860572483841184,
1434
+ "grad_norm": 4.00995610689072,
1435
+ "learning_rate": 1.280677882473488e-05,
1436
+ "loss": 1.2449,
1437
+ "step": 1015
1438
+ },
1439
+ {
1440
+ "epoch": 0.4709141274238227,
1441
+ "grad_norm": 3.7802768150285284,
1442
+ "learning_rate": 1.272933327160063e-05,
1443
+ "loss": 1.2055,
1444
+ "step": 1020
1445
+ },
1446
+ {
1447
+ "epoch": 0.4732225300092336,
1448
+ "grad_norm": 3.979719082398227,
1449
+ "learning_rate": 1.2651710434159223e-05,
1450
+ "loss": 1.1452,
1451
+ "step": 1025
1452
+ },
1453
+ {
1454
+ "epoch": 0.4755309325946445,
1455
+ "grad_norm": 3.7987734509998012,
1456
+ "learning_rate": 1.2573915354415274e-05,
1457
+ "loss": 1.2266,
1458
+ "step": 1030
1459
+ },
1460
+ {
1461
+ "epoch": 0.4778393351800554,
1462
+ "grad_norm": 3.4449265105850344,
1463
+ "learning_rate": 1.2495953085561426e-05,
1464
+ "loss": 1.1678,
1465
+ "step": 1035
1466
+ },
1467
+ {
1468
+ "epoch": 0.4801477377654663,
1469
+ "grad_norm": 4.703831538180476,
1470
+ "learning_rate": 1.241782869165012e-05,
1471
+ "loss": 1.1893,
1472
+ "step": 1040
1473
+ },
1474
+ {
1475
+ "epoch": 0.4824561403508772,
1476
+ "grad_norm": 3.56138065098868,
1477
+ "learning_rate": 1.2339547247264658e-05,
1478
+ "loss": 1.2285,
1479
+ "step": 1045
1480
+ },
1481
+ {
1482
+ "epoch": 0.48476454293628807,
1483
+ "grad_norm": 3.8664090630676147,
1484
+ "learning_rate": 1.2261113837189587e-05,
1485
+ "loss": 1.1995,
1486
+ "step": 1050
1487
+ },
1488
+ {
1489
+ "epoch": 0.487072945521699,
1490
+ "grad_norm": 3.6587622685467553,
1491
+ "learning_rate": 1.2182533556080402e-05,
1492
+ "loss": 1.2456,
1493
+ "step": 1055
1494
+ },
1495
+ {
1496
+ "epoch": 0.48938134810710987,
1497
+ "grad_norm": 3.4219623018934615,
1498
+ "learning_rate": 1.2103811508132642e-05,
1499
+ "loss": 1.1904,
1500
+ "step": 1060
1501
+ },
1502
+ {
1503
+ "epoch": 0.4916897506925208,
1504
+ "grad_norm": 3.91141223990254,
1505
+ "learning_rate": 1.2024952806750321e-05,
1506
+ "loss": 1.1811,
1507
+ "step": 1065
1508
+ },
1509
+ {
1510
+ "epoch": 0.4939981532779317,
1511
+ "grad_norm": 3.707066130468398,
1512
+ "learning_rate": 1.1945962574213814e-05,
1513
+ "loss": 1.212,
1514
+ "step": 1070
1515
+ },
1516
+ {
1517
+ "epoch": 0.49630655586334255,
1518
+ "grad_norm": 3.5782501836947653,
1519
+ "learning_rate": 1.1866845941347118e-05,
1520
+ "loss": 1.2255,
1521
+ "step": 1075
1522
+ },
1523
+ {
1524
+ "epoch": 0.4986149584487535,
1525
+ "grad_norm": 4.303350644777213,
1526
+ "learning_rate": 1.1787608047184583e-05,
1527
+ "loss": 1.1376,
1528
+ "step": 1080
1529
+ },
1530
+ {
1531
+ "epoch": 0.5009233610341643,
1532
+ "grad_norm": 3.419543860379626,
1533
+ "learning_rate": 1.1708254038637115e-05,
1534
+ "loss": 1.1872,
1535
+ "step": 1085
1536
+ },
1537
+ {
1538
+ "epoch": 0.5032317636195752,
1539
+ "grad_norm": 3.586294780528409,
1540
+ "learning_rate": 1.1628789070157836e-05,
1541
+ "loss": 1.2114,
1542
+ "step": 1090
1543
+ },
1544
+ {
1545
+ "epoch": 0.5055401662049861,
1546
+ "grad_norm": 3.6647616517214496,
1547
+ "learning_rate": 1.1549218303407305e-05,
1548
+ "loss": 1.2088,
1549
+ "step": 1095
1550
+ },
1551
+ {
1552
+ "epoch": 0.5078485687903971,
1553
+ "grad_norm": 3.6209405687157794,
1554
+ "learning_rate": 1.1469546906918219e-05,
1555
+ "loss": 1.1535,
1556
+ "step": 1100
1557
+ },
1558
+ {
1559
+ "epoch": 0.510156971375808,
1560
+ "grad_norm": 3.4760951984933777,
1561
+ "learning_rate": 1.1389780055759689e-05,
1562
+ "loss": 1.1692,
1563
+ "step": 1105
1564
+ },
1565
+ {
1566
+ "epoch": 0.5124653739612188,
1567
+ "grad_norm": 3.523587148397925,
1568
+ "learning_rate": 1.1309922931201114e-05,
1569
+ "loss": 1.1795,
1570
+ "step": 1110
1571
+ },
1572
+ {
1573
+ "epoch": 0.5147737765466297,
1574
+ "grad_norm": 3.399747435026194,
1575
+ "learning_rate": 1.1229980720375609e-05,
1576
+ "loss": 1.1913,
1577
+ "step": 1115
1578
+ },
1579
+ {
1580
+ "epoch": 0.5170821791320406,
1581
+ "grad_norm": 3.802970464768176,
1582
+ "learning_rate": 1.114995861594308e-05,
1583
+ "loss": 1.1692,
1584
+ "step": 1120
1585
+ },
1586
+ {
1587
+ "epoch": 0.5193905817174516,
1588
+ "grad_norm": 3.571347595436078,
1589
+ "learning_rate": 1.1069861815752944e-05,
1590
+ "loss": 1.1575,
1591
+ "step": 1125
1592
+ },
1593
+ {
1594
+ "epoch": 0.5216989843028624,
1595
+ "grad_norm": 3.702241350827994,
1596
+ "learning_rate": 1.0989695522506486e-05,
1597
+ "loss": 1.1776,
1598
+ "step": 1130
1599
+ },
1600
+ {
1601
+ "epoch": 0.5240073868882733,
1602
+ "grad_norm": 4.396145181294285,
1603
+ "learning_rate": 1.0909464943418926e-05,
1604
+ "loss": 1.2055,
1605
+ "step": 1135
1606
+ },
1607
+ {
1608
+ "epoch": 0.5263157894736842,
1609
+ "grad_norm": 3.402649511273165,
1610
+ "learning_rate": 1.0829175289881188e-05,
1611
+ "loss": 1.2024,
1612
+ "step": 1140
1613
+ },
1614
+ {
1615
+ "epoch": 0.528624192059095,
1616
+ "grad_norm": 3.321901777095843,
1617
+ "learning_rate": 1.074883177712138e-05,
1618
+ "loss": 1.1317,
1619
+ "step": 1145
1620
+ },
1621
+ {
1622
+ "epoch": 0.530932594644506,
1623
+ "grad_norm": 4.575011114858196,
1624
+ "learning_rate": 1.0668439623866043e-05,
1625
+ "loss": 1.1516,
1626
+ "step": 1150
1627
+ },
1628
+ {
1629
+ "epoch": 0.5332409972299169,
1630
+ "grad_norm": 3.428811319179132,
1631
+ "learning_rate": 1.0588004052001177e-05,
1632
+ "loss": 1.1326,
1633
+ "step": 1155
1634
+ },
1635
+ {
1636
+ "epoch": 0.5355493998153278,
1637
+ "grad_norm": 3.758823500740248,
1638
+ "learning_rate": 1.0507530286233042e-05,
1639
+ "loss": 1.1523,
1640
+ "step": 1160
1641
+ },
1642
+ {
1643
+ "epoch": 0.5378578024007387,
1644
+ "grad_norm": 3.828420445656179,
1645
+ "learning_rate": 1.0427023553748792e-05,
1646
+ "loss": 1.215,
1647
+ "step": 1165
1648
+ },
1649
+ {
1650
+ "epoch": 0.5401662049861495,
1651
+ "grad_norm": 3.872474623427253,
1652
+ "learning_rate": 1.0346489083876928e-05,
1653
+ "loss": 1.1798,
1654
+ "step": 1170
1655
+ },
1656
+ {
1657
+ "epoch": 0.5424746075715605,
1658
+ "grad_norm": 4.343223419966708,
1659
+ "learning_rate": 1.0265932107747656e-05,
1660
+ "loss": 1.1964,
1661
+ "step": 1175
1662
+ },
1663
+ {
1664
+ "epoch": 0.5447830101569714,
1665
+ "grad_norm": 3.4458152638291533,
1666
+ "learning_rate": 1.0185357857953064e-05,
1667
+ "loss": 1.188,
1668
+ "step": 1180
1669
+ },
1670
+ {
1671
+ "epoch": 0.5470914127423823,
1672
+ "grad_norm": 3.3343026801443765,
1673
+ "learning_rate": 1.0104771568207266e-05,
1674
+ "loss": 1.1524,
1675
+ "step": 1185
1676
+ },
1677
+ {
1678
+ "epoch": 0.5493998153277931,
1679
+ "grad_norm": 3.8325280372919774,
1680
+ "learning_rate": 1.0024178473006418e-05,
1681
+ "loss": 1.1445,
1682
+ "step": 1190
1683
+ },
1684
+ {
1685
+ "epoch": 0.551708217913204,
1686
+ "grad_norm": 3.913934401934443,
1687
+ "learning_rate": 9.943583807288746e-06,
1688
+ "loss": 1.1497,
1689
+ "step": 1195
1690
+ },
1691
+ {
1692
+ "epoch": 0.554016620498615,
1693
+ "grad_norm": 3.8771337742661585,
1694
+ "learning_rate": 9.862992806094473e-06,
1695
+ "loss": 1.1584,
1696
+ "step": 1200
1697
+ },
1698
+ {
1699
+ "epoch": 0.5563250230840259,
1700
+ "grad_norm": 3.385706053842486,
1701
+ "learning_rate": 9.782410704225793e-06,
1702
+ "loss": 1.133,
1703
+ "step": 1205
1704
+ },
1705
+ {
1706
+ "epoch": 0.5586334256694367,
1707
+ "grad_norm": 3.228558572718497,
1708
+ "learning_rate": 9.701842735906855e-06,
1709
+ "loss": 1.1714,
1710
+ "step": 1210
1711
+ },
1712
+ {
1713
+ "epoch": 0.5609418282548476,
1714
+ "grad_norm": 3.376489834368575,
1715
+ "learning_rate": 9.621294134443747e-06,
1716
+ "loss": 1.1782,
1717
+ "step": 1215
1718
+ },
1719
+ {
1720
+ "epoch": 0.5632502308402585,
1721
+ "grad_norm": 4.101023970778267,
1722
+ "learning_rate": 9.54077013188459e-06,
1723
+ "loss": 1.1679,
1724
+ "step": 1220
1725
+ },
1726
+ {
1727
+ "epoch": 0.5655586334256695,
1728
+ "grad_norm": 3.459693677788322,
1729
+ "learning_rate": 9.460275958679674e-06,
1730
+ "loss": 1.2272,
1731
+ "step": 1225
1732
+ },
1733
+ {
1734
+ "epoch": 0.5678670360110804,
1735
+ "grad_norm": 3.5741244509053556,
1736
+ "learning_rate": 9.379816843341715e-06,
1737
+ "loss": 1.1679,
1738
+ "step": 1230
1739
+ },
1740
+ {
1741
+ "epoch": 0.5701754385964912,
1742
+ "grad_norm": 14.959841662019736,
1743
+ "learning_rate": 9.299398012106246e-06,
1744
+ "loss": 1.1557,
1745
+ "step": 1235
1746
+ },
1747
+ {
1748
+ "epoch": 0.5724838411819021,
1749
+ "grad_norm": 3.479142794568544,
1750
+ "learning_rate": 9.219024688592136e-06,
1751
+ "loss": 1.191,
1752
+ "step": 1240
1753
+ },
1754
+ {
1755
+ "epoch": 0.574792243767313,
1756
+ "grad_norm": 3.4791994405128195,
1757
+ "learning_rate": 9.138702093462286e-06,
1758
+ "loss": 1.1632,
1759
+ "step": 1245
1760
+ },
1761
+ {
1762
+ "epoch": 0.577100646352724,
1763
+ "grad_norm": 3.378297795269278,
1764
+ "learning_rate": 9.058435444084543e-06,
1765
+ "loss": 1.2058,
1766
+ "step": 1250
1767
+ },
1768
+ {
1769
+ "epoch": 0.5794090489381348,
1770
+ "grad_norm": 3.3312286796444948,
1771
+ "learning_rate": 8.978229954192775e-06,
1772
+ "loss": 1.2072,
1773
+ "step": 1255
1774
+ },
1775
+ {
1776
+ "epoch": 0.5817174515235457,
1777
+ "grad_norm": 3.2936946867277497,
1778
+ "learning_rate": 8.898090833548226e-06,
1779
+ "loss": 1.1479,
1780
+ "step": 1260
1781
+ },
1782
+ {
1783
+ "epoch": 0.5840258541089566,
1784
+ "grad_norm": 3.5657195698986306,
1785
+ "learning_rate": 8.818023287601117e-06,
1786
+ "loss": 1.1579,
1787
+ "step": 1265
1788
+ },
1789
+ {
1790
+ "epoch": 0.5863342566943675,
1791
+ "grad_norm": 3.85534125869907,
1792
+ "learning_rate": 8.738032517152523e-06,
1793
+ "loss": 1.1748,
1794
+ "step": 1270
1795
+ },
1796
+ {
1797
+ "epoch": 0.5886426592797784,
1798
+ "grad_norm": 3.3807308381583585,
1799
+ "learning_rate": 8.658123718016548e-06,
1800
+ "loss": 1.1365,
1801
+ "step": 1275
1802
+ },
1803
+ {
1804
+ "epoch": 0.5909510618651893,
1805
+ "grad_norm": 3.75547737356039,
1806
+ "learning_rate": 8.578302080682844e-06,
1807
+ "loss": 1.1657,
1808
+ "step": 1280
1809
+ },
1810
+ {
1811
+ "epoch": 0.5932594644506002,
1812
+ "grad_norm": 3.334058557259955,
1813
+ "learning_rate": 8.498572789979446e-06,
1814
+ "loss": 1.1653,
1815
+ "step": 1285
1816
+ },
1817
+ {
1818
+ "epoch": 0.5955678670360111,
1819
+ "grad_norm": 3.596795067568704,
1820
+ "learning_rate": 8.418941024735997e-06,
1821
+ "loss": 1.1909,
1822
+ "step": 1290
1823
+ },
1824
+ {
1825
+ "epoch": 0.5978762696214219,
1826
+ "grad_norm": 3.754106205642103,
1827
+ "learning_rate": 8.33941195744737e-06,
1828
+ "loss": 1.1595,
1829
+ "step": 1295
1830
+ },
1831
+ {
1832
+ "epoch": 0.6001846722068329,
1833
+ "grad_norm": 3.3575559431036988,
1834
+ "learning_rate": 8.259990753937662e-06,
1835
+ "loss": 1.1378,
1836
+ "step": 1300
1837
+ },
1838
+ {
1839
+ "epoch": 0.6024930747922438,
1840
+ "grad_norm": 4.011372021010383,
1841
+ "learning_rate": 8.18068257302466e-06,
1842
+ "loss": 1.1832,
1843
+ "step": 1305
1844
+ },
1845
+ {
1846
+ "epoch": 0.6048014773776547,
1847
+ "grad_norm": 3.404379906541828,
1848
+ "learning_rate": 8.101492566184757e-06,
1849
+ "loss": 1.1592,
1850
+ "step": 1310
1851
+ },
1852
+ {
1853
+ "epoch": 0.6071098799630655,
1854
+ "grad_norm": 3.498132121516362,
1855
+ "learning_rate": 8.022425877218321e-06,
1856
+ "loss": 1.1591,
1857
+ "step": 1315
1858
+ },
1859
+ {
1860
+ "epoch": 0.6094182825484764,
1861
+ "grad_norm": 3.523586580045349,
1862
+ "learning_rate": 7.943487641915595e-06,
1863
+ "loss": 1.1525,
1864
+ "step": 1320
1865
+ },
1866
+ {
1867
+ "epoch": 0.6117266851338874,
1868
+ "grad_norm": 3.6229726894839858,
1869
+ "learning_rate": 7.864682987723082e-06,
1870
+ "loss": 1.1618,
1871
+ "step": 1325
1872
+ },
1873
+ {
1874
+ "epoch": 0.6140350877192983,
1875
+ "grad_norm": 3.696989469787097,
1876
+ "learning_rate": 7.78601703341051e-06,
1877
+ "loss": 1.1824,
1878
+ "step": 1330
1879
+ },
1880
+ {
1881
+ "epoch": 0.6163434903047091,
1882
+ "grad_norm": 3.567967173775001,
1883
+ "learning_rate": 7.70749488873833e-06,
1884
+ "loss": 1.1792,
1885
+ "step": 1335
1886
+ },
1887
+ {
1888
+ "epoch": 0.61865189289012,
1889
+ "grad_norm": 3.399928397766497,
1890
+ "learning_rate": 7.629121654125808e-06,
1891
+ "loss": 1.1438,
1892
+ "step": 1340
1893
+ },
1894
+ {
1895
+ "epoch": 0.6209602954755309,
1896
+ "grad_norm": 3.6344006441397414,
1897
+ "learning_rate": 7.550902420319742e-06,
1898
+ "loss": 1.1591,
1899
+ "step": 1345
1900
+ },
1901
+ {
1902
+ "epoch": 0.6232686980609419,
1903
+ "grad_norm": 3.538106316840523,
1904
+ "learning_rate": 7.472842268063776e-06,
1905
+ "loss": 1.1311,
1906
+ "step": 1350
1907
+ },
1908
+ {
1909
+ "epoch": 0.6255771006463527,
1910
+ "grad_norm": 3.661558906665894,
1911
+ "learning_rate": 7.394946267768381e-06,
1912
+ "loss": 1.1621,
1913
+ "step": 1355
1914
+ },
1915
+ {
1916
+ "epoch": 0.6278855032317636,
1917
+ "grad_norm": 3.6197107279149954,
1918
+ "learning_rate": 7.317219479181517e-06,
1919
+ "loss": 1.1028,
1920
+ "step": 1360
1921
+ },
1922
+ {
1923
+ "epoch": 0.6301939058171745,
1924
+ "grad_norm": 3.4094252840241355,
1925
+ "learning_rate": 7.23966695105996e-06,
1926
+ "loss": 1.119,
1927
+ "step": 1365
1928
+ },
1929
+ {
1930
+ "epoch": 0.6325023084025854,
1931
+ "grad_norm": 3.4085855538144467,
1932
+ "learning_rate": 7.162293720841378e-06,
1933
+ "loss": 1.1438,
1934
+ "step": 1370
1935
+ },
1936
+ {
1937
+ "epoch": 0.6348107109879964,
1938
+ "grad_norm": 4.073406312500022,
1939
+ "learning_rate": 7.085104814317101e-06,
1940
+ "loss": 1.1729,
1941
+ "step": 1375
1942
+ },
1943
+ {
1944
+ "epoch": 0.6371191135734072,
1945
+ "grad_norm": 3.572178264074241,
1946
+ "learning_rate": 7.008105245305699e-06,
1947
+ "loss": 1.1661,
1948
+ "step": 1380
1949
+ },
1950
+ {
1951
+ "epoch": 0.6394275161588181,
1952
+ "grad_norm": 3.81528951625221,
1953
+ "learning_rate": 6.931300015327274e-06,
1954
+ "loss": 1.1571,
1955
+ "step": 1385
1956
+ },
1957
+ {
1958
+ "epoch": 0.641735918744229,
1959
+ "grad_norm": 3.2846636335941763,
1960
+ "learning_rate": 6.854694113278614e-06,
1961
+ "loss": 1.154,
1962
+ "step": 1390
1963
+ },
1964
+ {
1965
+ "epoch": 0.6440443213296398,
1966
+ "grad_norm": 3.2544013227776007,
1967
+ "learning_rate": 6.7782925151091224e-06,
1968
+ "loss": 1.0823,
1969
+ "step": 1395
1970
+ },
1971
+ {
1972
+ "epoch": 0.6463527239150508,
1973
+ "grad_norm": 3.482450898904014,
1974
+ "learning_rate": 6.702100183497613e-06,
1975
+ "loss": 1.1803,
1976
+ "step": 1400
1977
+ },
1978
+ {
1979
+ "epoch": 0.6486611265004617,
1980
+ "grad_norm": 3.412256349030684,
1981
+ "learning_rate": 6.62612206752995e-06,
1982
+ "loss": 1.1643,
1983
+ "step": 1405
1984
+ },
1985
+ {
1986
+ "epoch": 0.6509695290858726,
1987
+ "grad_norm": 3.75196899322532,
1988
+ "learning_rate": 6.550363102377588e-06,
1989
+ "loss": 1.1117,
1990
+ "step": 1410
1991
+ },
1992
+ {
1993
+ "epoch": 0.6532779316712835,
1994
+ "grad_norm": 3.3485189294016258,
1995
+ "learning_rate": 6.474828208976998e-06,
1996
+ "loss": 1.1466,
1997
+ "step": 1415
1998
+ },
1999
+ {
2000
+ "epoch": 0.6555863342566943,
2001
+ "grad_norm": 3.4421443761863104,
2002
+ "learning_rate": 6.3995222937100455e-06,
2003
+ "loss": 1.1468,
2004
+ "step": 1420
2005
+ },
2006
+ {
2007
+ "epoch": 0.6578947368421053,
2008
+ "grad_norm": 3.4653107797221683,
2009
+ "learning_rate": 6.324450248085265e-06,
2010
+ "loss": 1.1418,
2011
+ "step": 1425
2012
+ },
2013
+ {
2014
+ "epoch": 0.6602031394275162,
2015
+ "grad_norm": 3.450235228111911,
2016
+ "learning_rate": 6.249616948420161e-06,
2017
+ "loss": 1.1393,
2018
+ "step": 1430
2019
+ },
2020
+ {
2021
+ "epoch": 0.6625115420129271,
2022
+ "grad_norm": 3.648594332616919,
2023
+ "learning_rate": 6.175027255524446e-06,
2024
+ "loss": 1.1263,
2025
+ "step": 1435
2026
+ },
2027
+ {
2028
+ "epoch": 0.6648199445983379,
2029
+ "grad_norm": 3.50804118935427,
2030
+ "learning_rate": 6.100686014384315e-06,
2031
+ "loss": 1.1497,
2032
+ "step": 1440
2033
+ },
2034
+ {
2035
+ "epoch": 0.6671283471837488,
2036
+ "grad_norm": 3.407303145023877,
2037
+ "learning_rate": 6.026598053847743e-06,
2038
+ "loss": 1.1217,
2039
+ "step": 1445
2040
+ },
2041
+ {
2042
+ "epoch": 0.6694367497691598,
2043
+ "grad_norm": 3.6049741156075426,
2044
+ "learning_rate": 5.952768186310813e-06,
2045
+ "loss": 1.2134,
2046
+ "step": 1450
2047
+ },
2048
+ {
2049
+ "epoch": 0.6717451523545707,
2050
+ "grad_norm": 3.347553717603198,
2051
+ "learning_rate": 5.879201207405136e-06,
2052
+ "loss": 1.1189,
2053
+ "step": 1455
2054
+ },
2055
+ {
2056
+ "epoch": 0.6740535549399815,
2057
+ "grad_norm": 3.7624263901785087,
2058
+ "learning_rate": 5.805901895686344e-06,
2059
+ "loss": 1.1217,
2060
+ "step": 1460
2061
+ },
2062
+ {
2063
+ "epoch": 0.6763619575253924,
2064
+ "grad_norm": 3.6359056480115193,
2065
+ "learning_rate": 5.732875012323712e-06,
2066
+ "loss": 1.1275,
2067
+ "step": 1465
2068
+ },
2069
+ {
2070
+ "epoch": 0.6786703601108033,
2071
+ "grad_norm": 3.5660085050284946,
2072
+ "learning_rate": 5.660125300790873e-06,
2073
+ "loss": 1.153,
2074
+ "step": 1470
2075
+ },
2076
+ {
2077
+ "epoch": 0.6809787626962143,
2078
+ "grad_norm": 3.4188915438262946,
2079
+ "learning_rate": 5.58765748655772e-06,
2080
+ "loss": 1.126,
2081
+ "step": 1475
2082
+ },
2083
+ {
2084
+ "epoch": 0.6832871652816251,
2085
+ "grad_norm": 3.7409360766713995,
2086
+ "learning_rate": 5.5154762767834605e-06,
2087
+ "loss": 1.1312,
2088
+ "step": 1480
2089
+ },
2090
+ {
2091
+ "epoch": 0.685595567867036,
2092
+ "grad_norm": 3.5176838276710787,
2093
+ "learning_rate": 5.443586360010859e-06,
2094
+ "loss": 1.118,
2095
+ "step": 1485
2096
+ },
2097
+ {
2098
+ "epoch": 0.6879039704524469,
2099
+ "grad_norm": 3.940112737940071,
2100
+ "learning_rate": 5.3719924058616975e-06,
2101
+ "loss": 1.1084,
2102
+ "step": 1490
2103
+ },
2104
+ {
2105
+ "epoch": 0.6902123730378578,
2106
+ "grad_norm": 3.5725656073039516,
2107
+ "learning_rate": 5.30069906473345e-06,
2108
+ "loss": 1.1462,
2109
+ "step": 1495
2110
+ },
2111
+ {
2112
+ "epoch": 0.6925207756232687,
2113
+ "grad_norm": 3.585328985764251,
2114
+ "learning_rate": 5.2297109674972166e-06,
2115
+ "loss": 1.1275,
2116
+ "step": 1500
2117
+ },
2118
+ {
2119
+ "epoch": 0.6948291782086796,
2120
+ "grad_norm": 3.7150630899084276,
2121
+ "learning_rate": 5.159032725196946e-06,
2122
+ "loss": 1.1573,
2123
+ "step": 1505
2124
+ },
2125
+ {
2126
+ "epoch": 0.6971375807940905,
2127
+ "grad_norm": 3.4991531847637893,
2128
+ "learning_rate": 5.088668928749891e-06,
2129
+ "loss": 1.1339,
2130
+ "step": 1510
2131
+ },
2132
+ {
2133
+ "epoch": 0.6994459833795014,
2134
+ "grad_norm": 3.337315702796277,
2135
+ "learning_rate": 5.0186241486484245e-06,
2136
+ "loss": 1.1121,
2137
+ "step": 1515
2138
+ },
2139
+ {
2140
+ "epoch": 0.7017543859649122,
2141
+ "grad_norm": 3.166495977462906,
2142
+ "learning_rate": 4.948902934663158e-06,
2143
+ "loss": 1.1207,
2144
+ "step": 1520
2145
+ },
2146
+ {
2147
+ "epoch": 0.7040627885503232,
2148
+ "grad_norm": 3.269883473204096,
2149
+ "learning_rate": 4.879509815547413e-06,
2150
+ "loss": 1.1067,
2151
+ "step": 1525
2152
+ },
2153
+ {
2154
+ "epoch": 0.7063711911357341,
2155
+ "grad_norm": 3.2549683491943138,
2156
+ "learning_rate": 4.810449298743051e-06,
2157
+ "loss": 1.0858,
2158
+ "step": 1530
2159
+ },
2160
+ {
2161
+ "epoch": 0.708679593721145,
2162
+ "grad_norm": 3.673192940396545,
2163
+ "learning_rate": 4.741725870087693e-06,
2164
+ "loss": 1.1674,
2165
+ "step": 1535
2166
+ },
2167
+ {
2168
+ "epoch": 0.7109879963065558,
2169
+ "grad_norm": 3.295243146355197,
2170
+ "learning_rate": 4.673343993523347e-06,
2171
+ "loss": 1.1087,
2172
+ "step": 1540
2173
+ },
2174
+ {
2175
+ "epoch": 0.7132963988919667,
2176
+ "grad_norm": 3.4162872942710867,
2177
+ "learning_rate": 4.605308110806436e-06,
2178
+ "loss": 1.1224,
2179
+ "step": 1545
2180
+ },
2181
+ {
2182
+ "epoch": 0.7156048014773777,
2183
+ "grad_norm": 3.3989883160652865,
2184
+ "learning_rate": 4.537622641219309e-06,
2185
+ "loss": 1.1307,
2186
+ "step": 1550
2187
+ },
2188
+ {
2189
+ "epoch": 0.7179132040627886,
2190
+ "grad_norm": 3.2956559559454663,
2191
+ "learning_rate": 4.47029198128316e-06,
2192
+ "loss": 1.0944,
2193
+ "step": 1555
2194
+ },
2195
+ {
2196
+ "epoch": 0.7202216066481995,
2197
+ "grad_norm": 3.3797718456778765,
2198
+ "learning_rate": 4.403320504472463e-06,
2199
+ "loss": 1.1426,
2200
+ "step": 1560
2201
+ },
2202
+ {
2203
+ "epoch": 0.7225300092336103,
2204
+ "grad_norm": 3.1832339015639826,
2205
+ "learning_rate": 4.336712560930891e-06,
2206
+ "loss": 1.1223,
2207
+ "step": 1565
2208
+ },
2209
+ {
2210
+ "epoch": 0.7248384118190212,
2211
+ "grad_norm": 3.40273969921815,
2212
+ "learning_rate": 4.270472477188755e-06,
2213
+ "loss": 1.1151,
2214
+ "step": 1570
2215
+ },
2216
+ {
2217
+ "epoch": 0.7271468144044322,
2218
+ "grad_norm": 3.32953363908172,
2219
+ "learning_rate": 4.204604555881967e-06,
2220
+ "loss": 1.1055,
2221
+ "step": 1575
2222
+ },
2223
+ {
2224
+ "epoch": 0.7294552169898431,
2225
+ "grad_norm": 3.363759228103856,
2226
+ "learning_rate": 4.139113075472565e-06,
2227
+ "loss": 1.15,
2228
+ "step": 1580
2229
+ },
2230
+ {
2231
+ "epoch": 0.7317636195752539,
2232
+ "grad_norm": 3.5692625205390214,
2233
+ "learning_rate": 4.074002289970801e-06,
2234
+ "loss": 1.1249,
2235
+ "step": 1585
2236
+ },
2237
+ {
2238
+ "epoch": 0.7340720221606648,
2239
+ "grad_norm": 3.6298117912857895,
2240
+ "learning_rate": 4.009276428658836e-06,
2241
+ "loss": 1.0911,
2242
+ "step": 1590
2243
+ },
2244
+ {
2245
+ "epoch": 0.7363804247460757,
2246
+ "grad_norm": 3.501911680130801,
2247
+ "learning_rate": 3.944939695816005e-06,
2248
+ "loss": 1.0591,
2249
+ "step": 1595
2250
+ },
2251
+ {
2252
+ "epoch": 0.7386888273314867,
2253
+ "grad_norm": 3.314254645913856,
2254
+ "learning_rate": 3.8809962704457375e-06,
2255
+ "loss": 1.122,
2256
+ "step": 1600
2257
+ },
2258
+ {
2259
+ "epoch": 0.7409972299168975,
2260
+ "grad_norm": 3.56145944415269,
2261
+ "learning_rate": 3.81745030600411e-06,
2262
+ "loss": 1.1036,
2263
+ "step": 1605
2264
+ },
2265
+ {
2266
+ "epoch": 0.7433056325023084,
2267
+ "grad_norm": 3.4910849192084235,
2268
+ "learning_rate": 3.75430593013006e-06,
2269
+ "loss": 1.1353,
2270
+ "step": 1610
2271
+ },
2272
+ {
2273
+ "epoch": 0.7456140350877193,
2274
+ "grad_norm": 3.325715619787326,
2275
+ "learning_rate": 3.6915672443772644e-06,
2276
+ "loss": 1.1538,
2277
+ "step": 1615
2278
+ },
2279
+ {
2280
+ "epoch": 0.7479224376731302,
2281
+ "grad_norm": 3.5950013679874724,
2282
+ "learning_rate": 3.62923832394774e-06,
2283
+ "loss": 1.0909,
2284
+ "step": 1620
2285
+ },
2286
+ {
2287
+ "epoch": 0.7502308402585411,
2288
+ "grad_norm": 3.1524005532212334,
2289
+ "learning_rate": 3.56732321742712e-06,
2290
+ "loss": 1.1125,
2291
+ "step": 1625
2292
+ },
2293
+ {
2294
+ "epoch": 0.752539242843952,
2295
+ "grad_norm": 3.6760451234626124,
2296
+ "learning_rate": 3.5058259465216828e-06,
2297
+ "loss": 1.1039,
2298
+ "step": 1630
2299
+ },
2300
+ {
2301
+ "epoch": 0.7548476454293629,
2302
+ "grad_norm": 3.341546948891595,
2303
+ "learning_rate": 3.444750505797123e-06,
2304
+ "loss": 1.0531,
2305
+ "step": 1635
2306
+ },
2307
+ {
2308
+ "epoch": 0.7571560480147738,
2309
+ "grad_norm": 3.35627649123262,
2310
+ "learning_rate": 3.384100862419096e-06,
2311
+ "loss": 1.0931,
2312
+ "step": 1640
2313
+ },
2314
+ {
2315
+ "epoch": 0.7594644506001846,
2316
+ "grad_norm": 3.6221419833131527,
2317
+ "learning_rate": 3.3238809558955054e-06,
2318
+ "loss": 1.0797,
2319
+ "step": 1645
2320
+ },
2321
+ {
2322
+ "epoch": 0.7617728531855956,
2323
+ "grad_norm": 3.3487671296828267,
2324
+ "learning_rate": 3.2640946978206266e-06,
2325
+ "loss": 1.0812,
2326
+ "step": 1650
2327
+ },
2328
+ {
2329
+ "epoch": 0.7640812557710065,
2330
+ "grad_norm": 3.441031645390376,
2331
+ "learning_rate": 3.2047459716210306e-06,
2332
+ "loss": 1.1155,
2333
+ "step": 1655
2334
+ },
2335
+ {
2336
+ "epoch": 0.7663896583564174,
2337
+ "grad_norm": 3.4825057106301096,
2338
+ "learning_rate": 3.145838632303325e-06,
2339
+ "loss": 1.096,
2340
+ "step": 1660
2341
+ },
2342
+ {
2343
+ "epoch": 0.7686980609418282,
2344
+ "grad_norm": 3.4525699686491875,
2345
+ "learning_rate": 3.087376506203763e-06,
2346
+ "loss": 1.145,
2347
+ "step": 1665
2348
+ },
2349
+ {
2350
+ "epoch": 0.7710064635272391,
2351
+ "grad_norm": 3.2639030957505715,
2352
+ "learning_rate": 3.0293633907396903e-06,
2353
+ "loss": 1.0711,
2354
+ "step": 1670
2355
+ },
2356
+ {
2357
+ "epoch": 0.7733148661126501,
2358
+ "grad_norm": 3.247147491878351,
2359
+ "learning_rate": 2.971803054162903e-06,
2360
+ "loss": 1.0367,
2361
+ "step": 1675
2362
+ },
2363
+ {
2364
+ "epoch": 0.775623268698061,
2365
+ "grad_norm": 3.3628039668359824,
2366
+ "learning_rate": 2.914699235314855e-06,
2367
+ "loss": 1.1311,
2368
+ "step": 1680
2369
+ },
2370
+ {
2371
+ "epoch": 0.7779316712834718,
2372
+ "grad_norm": 3.294560766749018,
2373
+ "learning_rate": 2.858055643383818e-06,
2374
+ "loss": 1.1303,
2375
+ "step": 1685
2376
+ },
2377
+ {
2378
+ "epoch": 0.7802400738688827,
2379
+ "grad_norm": 3.252460881051861,
2380
+ "learning_rate": 2.8018759576639478e-06,
2381
+ "loss": 1.0894,
2382
+ "step": 1690
2383
+ },
2384
+ {
2385
+ "epoch": 0.7825484764542936,
2386
+ "grad_norm": 3.6541818791755083,
2387
+ "learning_rate": 2.7461638273162895e-06,
2388
+ "loss": 1.1416,
2389
+ "step": 1695
2390
+ },
2391
+ {
2392
+ "epoch": 0.7848568790397045,
2393
+ "grad_norm": 3.3018114290440286,
2394
+ "learning_rate": 2.6909228711317526e-06,
2395
+ "loss": 1.0898,
2396
+ "step": 1700
2397
+ },
2398
+ {
2399
+ "epoch": 0.7871652816251155,
2400
+ "grad_norm": 3.5110479717681704,
2401
+ "learning_rate": 2.6361566772960466e-06,
2402
+ "loss": 1.0887,
2403
+ "step": 1705
2404
+ },
2405
+ {
2406
+ "epoch": 0.7894736842105263,
2407
+ "grad_norm": 3.469571849173682,
2408
+ "learning_rate": 2.5818688031566132e-06,
2409
+ "loss": 1.0182,
2410
+ "step": 1710
2411
+ },
2412
+ {
2413
+ "epoch": 0.7917820867959372,
2414
+ "grad_norm": 3.761287355693432,
2415
+ "learning_rate": 2.5280627749915544e-06,
2416
+ "loss": 1.1246,
2417
+ "step": 1715
2418
+ },
2419
+ {
2420
+ "epoch": 0.7940904893813481,
2421
+ "grad_norm": 3.7171990367681866,
2422
+ "learning_rate": 2.4747420877805905e-06,
2423
+ "loss": 1.1008,
2424
+ "step": 1720
2425
+ },
2426
+ {
2427
+ "epoch": 0.796398891966759,
2428
+ "grad_norm": 3.583342537171837,
2429
+ "learning_rate": 2.421910204978033e-06,
2430
+ "loss": 1.092,
2431
+ "step": 1725
2432
+ },
2433
+ {
2434
+ "epoch": 0.7987072945521699,
2435
+ "grad_norm": 3.3105866570237343,
2436
+ "learning_rate": 2.369570558287819e-06,
2437
+ "loss": 1.0495,
2438
+ "step": 1730
2439
+ },
2440
+ {
2441
+ "epoch": 0.8010156971375808,
2442
+ "grad_norm": 3.453250654565143,
2443
+ "learning_rate": 2.3177265474406084e-06,
2444
+ "loss": 1.0952,
2445
+ "step": 1735
2446
+ },
2447
+ {
2448
+ "epoch": 0.8033240997229917,
2449
+ "grad_norm": 3.2111312681793294,
2450
+ "learning_rate": 2.2663815399729495e-06,
2451
+ "loss": 1.0756,
2452
+ "step": 1740
2453
+ },
2454
+ {
2455
+ "epoch": 0.8056325023084026,
2456
+ "grad_norm": 3.398739502823191,
2457
+ "learning_rate": 2.215538871008538e-06,
2458
+ "loss": 1.0855,
2459
+ "step": 1745
2460
+ },
2461
+ {
2462
+ "epoch": 0.8079409048938134,
2463
+ "grad_norm": 3.4089573083048883,
2464
+ "learning_rate": 2.1652018430415923e-06,
2465
+ "loss": 1.0707,
2466
+ "step": 1750
2467
+ },
2468
+ {
2469
+ "epoch": 0.8102493074792244,
2470
+ "grad_norm": 3.7996382043873744,
2471
+ "learning_rate": 2.115373725722326e-06,
2472
+ "loss": 1.1419,
2473
+ "step": 1755
2474
+ },
2475
+ {
2476
+ "epoch": 0.8125577100646353,
2477
+ "grad_norm": 3.4303103622199203,
2478
+ "learning_rate": 2.066057755644587e-06,
2479
+ "loss": 1.1101,
2480
+ "step": 1760
2481
+ },
2482
+ {
2483
+ "epoch": 0.8148661126500462,
2484
+ "grad_norm": 3.3758394994097363,
2485
+ "learning_rate": 2.0172571361356007e-06,
2486
+ "loss": 1.0975,
2487
+ "step": 1765
2488
+ },
2489
+ {
2490
+ "epoch": 0.817174515235457,
2491
+ "grad_norm": 3.2901551940425673,
2492
+ "learning_rate": 1.9689750370479134e-06,
2493
+ "loss": 1.0797,
2494
+ "step": 1770
2495
+ },
2496
+ {
2497
+ "epoch": 0.8194829178208679,
2498
+ "grad_norm": 3.661068632899665,
2499
+ "learning_rate": 1.921214594553488e-06,
2500
+ "loss": 1.1287,
2501
+ "step": 1775
2502
+ },
2503
+ {
2504
+ "epoch": 0.8217913204062789,
2505
+ "grad_norm": 3.5442080312978415,
2506
+ "learning_rate": 1.8739789109399954e-06,
2507
+ "loss": 1.1514,
2508
+ "step": 1780
2509
+ },
2510
+ {
2511
+ "epoch": 0.8240997229916898,
2512
+ "grad_norm": 3.3534741257777325,
2513
+ "learning_rate": 1.8272710544093019e-06,
2514
+ "loss": 1.0824,
2515
+ "step": 1785
2516
+ },
2517
+ {
2518
+ "epoch": 0.8264081255771006,
2519
+ "grad_norm": 3.570055818522298,
2520
+ "learning_rate": 1.7810940588781811e-06,
2521
+ "loss": 1.1313,
2522
+ "step": 1790
2523
+ },
2524
+ {
2525
+ "epoch": 0.8287165281625115,
2526
+ "grad_norm": 3.3907592881825352,
2527
+ "learning_rate": 1.7354509237812334e-06,
2528
+ "loss": 1.0458,
2529
+ "step": 1795
2530
+ },
2531
+ {
2532
+ "epoch": 0.8310249307479224,
2533
+ "grad_norm": 3.7660635086416794,
2534
+ "learning_rate": 1.690344613876066e-06,
2535
+ "loss": 1.109,
2536
+ "step": 1800
2537
+ },
2538
+ {
2539
+ "epoch": 0.8333333333333334,
2540
+ "grad_norm": 20.624336407323348,
2541
+ "learning_rate": 1.64577805905072e-06,
2542
+ "loss": 1.0872,
2543
+ "step": 1805
2544
+ },
2545
+ {
2546
+ "epoch": 0.8356417359187442,
2547
+ "grad_norm": 3.38434013035599,
2548
+ "learning_rate": 1.601754154133347e-06,
2549
+ "loss": 1.0943,
2550
+ "step": 1810
2551
+ },
2552
+ {
2553
+ "epoch": 0.8379501385041551,
2554
+ "grad_norm": 3.3282071197431318,
2555
+ "learning_rate": 1.558275758704183e-06,
2556
+ "loss": 1.0983,
2557
+ "step": 1815
2558
+ },
2559
+ {
2560
+ "epoch": 0.840258541089566,
2561
+ "grad_norm": 3.4156960745203286,
2562
+ "learning_rate": 1.5153456969098013e-06,
2563
+ "loss": 1.0381,
2564
+ "step": 1820
2565
+ },
2566
+ {
2567
+ "epoch": 0.8425669436749769,
2568
+ "grad_norm": 3.3418973703656274,
2569
+ "learning_rate": 1.4729667572796735e-06,
2570
+ "loss": 1.1452,
2571
+ "step": 1825
2572
+ },
2573
+ {
2574
+ "epoch": 0.8448753462603878,
2575
+ "grad_norm": 3.333897377962453,
2576
+ "learning_rate": 1.431141692545036e-06,
2577
+ "loss": 1.1076,
2578
+ "step": 1830
2579
+ },
2580
+ {
2581
+ "epoch": 0.8471837488457987,
2582
+ "grad_norm": 3.402941306050666,
2583
+ "learning_rate": 1.389873219460085e-06,
2584
+ "loss": 1.0869,
2585
+ "step": 1835
2586
+ },
2587
+ {
2588
+ "epoch": 0.8494921514312096,
2589
+ "grad_norm": 3.3313186519496423,
2590
+ "learning_rate": 1.349164018625513e-06,
2591
+ "loss": 1.0765,
2592
+ "step": 1840
2593
+ },
2594
+ {
2595
+ "epoch": 0.8518005540166205,
2596
+ "grad_norm": 3.6011720414080566,
2597
+ "learning_rate": 1.3090167343143911e-06,
2598
+ "loss": 1.0846,
2599
+ "step": 1845
2600
+ },
2601
+ {
2602
+ "epoch": 0.8541089566020313,
2603
+ "grad_norm": 3.629326020817196,
2604
+ "learning_rate": 1.2694339743004037e-06,
2605
+ "loss": 1.1088,
2606
+ "step": 1850
2607
+ },
2608
+ {
2609
+ "epoch": 0.8564173591874423,
2610
+ "grad_norm": 3.6305906598709767,
2611
+ "learning_rate": 1.2304183096884626e-06,
2612
+ "loss": 1.0875,
2613
+ "step": 1855
2614
+ },
2615
+ {
2616
+ "epoch": 0.8587257617728532,
2617
+ "grad_norm": 3.35865168543221,
2618
+ "learning_rate": 1.1919722747477024e-06,
2619
+ "loss": 1.1143,
2620
+ "step": 1860
2621
+ },
2622
+ {
2623
+ "epoch": 0.8610341643582641,
2624
+ "grad_norm": 3.3889339992199177,
2625
+ "learning_rate": 1.1540983667468686e-06,
2626
+ "loss": 1.0916,
2627
+ "step": 1865
2628
+ },
2629
+ {
2630
+ "epoch": 0.863342566943675,
2631
+ "grad_norm": 3.3133014347890324,
2632
+ "learning_rate": 1.1167990457920985e-06,
2633
+ "loss": 1.0877,
2634
+ "step": 1870
2635
+ },
2636
+ {
2637
+ "epoch": 0.8656509695290858,
2638
+ "grad_norm": 3.415023896862017,
2639
+ "learning_rate": 1.0800767346671347e-06,
2640
+ "loss": 1.0284,
2641
+ "step": 1875
2642
+ },
2643
+ {
2644
+ "epoch": 0.8679593721144968,
2645
+ "grad_norm": 3.322962975732958,
2646
+ "learning_rate": 1.043933818675944e-06,
2647
+ "loss": 1.0782,
2648
+ "step": 1880
2649
+ },
2650
+ {
2651
+ "epoch": 0.8702677746999077,
2652
+ "grad_norm": 3.583896655771928,
2653
+ "learning_rate": 1.008372645487785e-06,
2654
+ "loss": 1.08,
2655
+ "step": 1885
2656
+ },
2657
+ {
2658
+ "epoch": 0.8725761772853186,
2659
+ "grad_norm": 3.3057678718948726,
2660
+ "learning_rate": 9.733955249847183e-07,
2661
+ "loss": 1.1034,
2662
+ "step": 1890
2663
+ },
2664
+ {
2665
+ "epoch": 0.8748845798707294,
2666
+ "grad_norm": 3.4387092657320997,
2667
+ "learning_rate": 9.390047291115567e-07,
2668
+ "loss": 1.0915,
2669
+ "step": 1895
2670
+ },
2671
+ {
2672
+ "epoch": 0.8771929824561403,
2673
+ "grad_norm": 3.8029482282950324,
2674
+ "learning_rate": 9.052024917282987e-07,
2675
+ "loss": 1.057,
2676
+ "step": 1900
2677
+ },
2678
+ {
2679
+ "epoch": 0.8795013850415513,
2680
+ "grad_norm": 3.3990790831971465,
2681
+ "learning_rate": 8.719910084650262e-07,
2682
+ "loss": 1.0725,
2683
+ "step": 1905
2684
+ },
2685
+ {
2686
+ "epoch": 0.8818097876269622,
2687
+ "grad_norm": 3.262416726762208,
2688
+ "learning_rate": 8.393724365792866e-07,
2689
+ "loss": 1.1028,
2690
+ "step": 1910
2691
+ },
2692
+ {
2693
+ "epoch": 0.884118190212373,
2694
+ "grad_norm": 3.551691283783414,
2695
+ "learning_rate": 8.073488948159691e-07,
2696
+ "loss": 1.0546,
2697
+ "step": 1915
2698
+ },
2699
+ {
2700
+ "epoch": 0.8864265927977839,
2701
+ "grad_norm": 3.5211563130144197,
2702
+ "learning_rate": 7.759224632696793e-07,
2703
+ "loss": 1.1024,
2704
+ "step": 1920
2705
+ },
2706
+ {
2707
+ "epoch": 0.8887349953831948,
2708
+ "grad_norm": 3.5958803804208976,
2709
+ "learning_rate": 7.450951832496233e-07,
2710
+ "loss": 1.0698,
2711
+ "step": 1925
2712
+ },
2713
+ {
2714
+ "epoch": 0.8910433979686058,
2715
+ "grad_norm": 4.107811963680795,
2716
+ "learning_rate": 7.148690571470251e-07,
2717
+ "loss": 1.0613,
2718
+ "step": 1930
2719
+ },
2720
+ {
2721
+ "epoch": 0.8933518005540166,
2722
+ "grad_norm": 3.6280688174940416,
2723
+ "learning_rate": 6.852460483050494e-07,
2724
+ "loss": 1.0987,
2725
+ "step": 1935
2726
+ },
2727
+ {
2728
+ "epoch": 0.8956602031394275,
2729
+ "grad_norm": 3.4197153407779055,
2730
+ "learning_rate": 6.562280808912768e-07,
2731
+ "loss": 1.081,
2732
+ "step": 1940
2733
+ },
2734
+ {
2735
+ "epoch": 0.8979686057248384,
2736
+ "grad_norm": 3.3975321682078494,
2737
+ "learning_rate": 6.278170397727179e-07,
2738
+ "loss": 1.0881,
2739
+ "step": 1945
2740
+ },
2741
+ {
2742
+ "epoch": 0.9002770083102493,
2743
+ "grad_norm": 3.385824657440924,
2744
+ "learning_rate": 6.000147703933845e-07,
2745
+ "loss": 1.0725,
2746
+ "step": 1950
2747
+ },
2748
+ {
2749
+ "epoch": 0.9025854108956602,
2750
+ "grad_norm": 3.733106691108023,
2751
+ "learning_rate": 5.728230786544153e-07,
2752
+ "loss": 1.0886,
2753
+ "step": 1955
2754
+ },
2755
+ {
2756
+ "epoch": 0.9048938134810711,
2757
+ "grad_norm": 3.3831515124529288,
2758
+ "learning_rate": 5.46243730796776e-07,
2759
+ "loss": 1.0854,
2760
+ "step": 1960
2761
+ },
2762
+ {
2763
+ "epoch": 0.907202216066482,
2764
+ "grad_norm": 3.4106013139907065,
2765
+ "learning_rate": 5.202784532865302e-07,
2766
+ "loss": 1.114,
2767
+ "step": 1965
2768
+ },
2769
+ {
2770
+ "epoch": 0.9095106186518929,
2771
+ "grad_norm": 3.130011381325973,
2772
+ "learning_rate": 4.949289327026952e-07,
2773
+ "loss": 1.0873,
2774
+ "step": 1970
2775
+ },
2776
+ {
2777
+ "epoch": 0.9118190212373037,
2778
+ "grad_norm": 3.3600219468750394,
2779
+ "learning_rate": 4.7019681562769816e-07,
2780
+ "loss": 1.0689,
2781
+ "step": 1975
2782
+ },
2783
+ {
2784
+ "epoch": 0.9141274238227147,
2785
+ "grad_norm": 3.379655670825615,
2786
+ "learning_rate": 4.460837085404113e-07,
2787
+ "loss": 1.0874,
2788
+ "step": 1980
2789
+ },
2790
+ {
2791
+ "epoch": 0.9164358264081256,
2792
+ "grad_norm": 3.324809563310868,
2793
+ "learning_rate": 4.225911777118097e-07,
2794
+ "loss": 1.0894,
2795
+ "step": 1985
2796
+ },
2797
+ {
2798
+ "epoch": 0.9187442289935365,
2799
+ "grad_norm": 3.4668181744618196,
2800
+ "learning_rate": 3.9972074910323066e-07,
2801
+ "loss": 1.0896,
2802
+ "step": 1990
2803
+ },
2804
+ {
2805
+ "epoch": 0.9210526315789473,
2806
+ "grad_norm": 3.4175120046363276,
2807
+ "learning_rate": 3.7747390826725736e-07,
2808
+ "loss": 1.0608,
2809
+ "step": 1995
2810
+ },
2811
+ {
2812
+ "epoch": 0.9233610341643582,
2813
+ "grad_norm": 3.365932789028912,
2814
+ "learning_rate": 3.5585210025122166e-07,
2815
+ "loss": 1.0465,
2816
+ "step": 2000
2817
+ },
2818
+ {
2819
+ "epoch": 0.9256694367497692,
2820
+ "grad_norm": 3.3721429412301442,
2821
+ "learning_rate": 3.3485672950334447e-07,
2822
+ "loss": 1.0782,
2823
+ "step": 2005
2824
+ },
2825
+ {
2826
+ "epoch": 0.9279778393351801,
2827
+ "grad_norm": 3.402893452692765,
2828
+ "learning_rate": 3.1448915978150365e-07,
2829
+ "loss": 1.0575,
2830
+ "step": 2010
2831
+ },
2832
+ {
2833
+ "epoch": 0.930286241920591,
2834
+ "grad_norm": 3.3246351042614606,
2835
+ "learning_rate": 2.947507140646588e-07,
2836
+ "loss": 1.093,
2837
+ "step": 2015
2838
+ },
2839
+ {
2840
+ "epoch": 0.9325946445060018,
2841
+ "grad_norm": 3.42392243323848,
2842
+ "learning_rate": 2.756426744669105e-07,
2843
+ "loss": 1.0709,
2844
+ "step": 2020
2845
+ },
2846
+ {
2847
+ "epoch": 0.9349030470914127,
2848
+ "grad_norm": 3.3870385964627565,
2849
+ "learning_rate": 2.57166282154222e-07,
2850
+ "loss": 1.0944,
2851
+ "step": 2025
2852
+ },
2853
+ {
2854
+ "epoch": 0.9372114496768237,
2855
+ "grad_norm": 3.4345800654530128,
2856
+ "learning_rate": 2.393227372638018e-07,
2857
+ "loss": 1.0829,
2858
+ "step": 2030
2859
+ },
2860
+ {
2861
+ "epoch": 0.9395198522622346,
2862
+ "grad_norm": 3.2304527099741094,
2863
+ "learning_rate": 2.221131988261438e-07,
2864
+ "loss": 1.0663,
2865
+ "step": 2035
2866
+ },
2867
+ {
2868
+ "epoch": 0.9418282548476454,
2869
+ "grad_norm": 3.4212248154000324,
2870
+ "learning_rate": 2.055387846897472e-07,
2871
+ "loss": 1.0608,
2872
+ "step": 2040
2873
+ },
2874
+ {
2875
+ "epoch": 0.9441366574330563,
2876
+ "grad_norm": 3.3424495231710356,
2877
+ "learning_rate": 1.8960057144850163e-07,
2878
+ "loss": 1.0513,
2879
+ "step": 2045
2880
+ },
2881
+ {
2882
+ "epoch": 0.9464450600184672,
2883
+ "grad_norm": 8.42913586604929,
2884
+ "learning_rate": 1.742995943717607e-07,
2885
+ "loss": 1.0698,
2886
+ "step": 2050
2887
+ },
2888
+ {
2889
+ "epoch": 0.9487534626038782,
2890
+ "grad_norm": 4.03605470816158,
2891
+ "learning_rate": 1.5963684733709462e-07,
2892
+ "loss": 1.0787,
2893
+ "step": 2055
2894
+ },
2895
+ {
2896
+ "epoch": 0.951061865189289,
2897
+ "grad_norm": 3.572766321551919,
2898
+ "learning_rate": 1.4561328276573415e-07,
2899
+ "loss": 1.0625,
2900
+ "step": 2060
2901
+ },
2902
+ {
2903
+ "epoch": 0.9533702677746999,
2904
+ "grad_norm": 3.213406555168112,
2905
+ "learning_rate": 1.3222981156070126e-07,
2906
+ "loss": 1.0861,
2907
+ "step": 2065
2908
+ },
2909
+ {
2910
+ "epoch": 0.9556786703601108,
2911
+ "grad_norm": 3.216022210724082,
2912
+ "learning_rate": 1.1948730304764622e-07,
2913
+ "loss": 1.0572,
2914
+ "step": 2070
2915
+ },
2916
+ {
2917
+ "epoch": 0.9579870729455217,
2918
+ "grad_norm": 3.8142801195990237,
2919
+ "learning_rate": 1.073865849183786e-07,
2920
+ "loss": 1.1151,
2921
+ "step": 2075
2922
+ },
2923
+ {
2924
+ "epoch": 0.9602954755309326,
2925
+ "grad_norm": 3.2011503381896413,
2926
+ "learning_rate": 9.592844317710238e-08,
2927
+ "loss": 1.0585,
2928
+ "step": 2080
2929
+ },
2930
+ {
2931
+ "epoch": 0.9626038781163435,
2932
+ "grad_norm": 3.3780038857652226,
2933
+ "learning_rate": 8.511362208936447e-08,
2934
+ "loss": 1.0591,
2935
+ "step": 2085
2936
+ },
2937
+ {
2938
+ "epoch": 0.9649122807017544,
2939
+ "grad_norm": 3.3212612452494295,
2940
+ "learning_rate": 7.494282413371135e-08,
2941
+ "loss": 1.0787,
2942
+ "step": 2090
2943
+ },
2944
+ {
2945
+ "epoch": 0.9672206832871653,
2946
+ "grad_norm": 3.797857330316498,
2947
+ "learning_rate": 6.541670995605321e-08,
2948
+ "loss": 1.0859,
2949
+ "step": 2095
2950
+ },
2951
+ {
2952
+ "epoch": 0.9695290858725761,
2953
+ "grad_norm": 3.153773745189338,
2954
+ "learning_rate": 5.653589832675943e-08,
2955
+ "loss": 1.0983,
2956
+ "step": 2100
2957
+ },
2958
+ {
2959
+ "epoch": 0.9718374884579871,
2960
+ "grad_norm": 3.4652822167549906,
2961
+ "learning_rate": 4.830096610045854e-08,
2962
+ "loss": 1.0713,
2963
+ "step": 2105
2964
+ },
2965
+ {
2966
+ "epoch": 0.974145891043398,
2967
+ "grad_norm": 3.6601967632905796,
2968
+ "learning_rate": 4.071244817857589e-08,
2969
+ "loss": 1.1118,
2970
+ "step": 2110
2971
+ },
2972
+ {
2973
+ "epoch": 0.9764542936288089,
2974
+ "grad_norm": 3.135385406063897,
2975
+ "learning_rate": 3.3770837474584874e-08,
2976
+ "loss": 1.072,
2977
+ "step": 2115
2978
+ },
2979
+ {
2980
+ "epoch": 0.9787626962142197,
2981
+ "grad_norm": 3.4884714571677784,
2982
+ "learning_rate": 2.747658488199023e-08,
2983
+ "loss": 1.0738,
2984
+ "step": 2120
2985
+ },
2986
+ {
2987
+ "epoch": 0.9810710987996306,
2988
+ "grad_norm": 3.7803263448925706,
2989
+ "learning_rate": 2.1830099245040427e-08,
2990
+ "loss": 1.0549,
2991
+ "step": 2125
2992
+ },
2993
+ {
2994
+ "epoch": 0.9833795013850416,
2995
+ "grad_norm": 3.3045019552603585,
2996
+ "learning_rate": 1.683174733216997e-08,
2997
+ "loss": 1.1129,
2998
+ "step": 2130
2999
+ },
3000
+ {
3001
+ "epoch": 0.9856879039704525,
3002
+ "grad_norm": 3.2212668180182784,
3003
+ "learning_rate": 1.248185381217848e-08,
3004
+ "loss": 1.0777,
3005
+ "step": 2135
3006
+ },
3007
+ {
3008
+ "epoch": 0.9879963065558633,
3009
+ "grad_norm": 3.324768260260177,
3010
+ "learning_rate": 8.780701233139789e-09,
3011
+ "loss": 1.0503,
3012
+ "step": 2140
3013
+ },
3014
+ {
3015
+ "epoch": 0.9903047091412742,
3016
+ "grad_norm": 3.214869100486745,
3017
+ "learning_rate": 5.728530004051047e-09,
3018
+ "loss": 1.0367,
3019
+ "step": 2145
3020
+ },
3021
+ {
3022
+ "epoch": 0.9926131117266851,
3023
+ "grad_norm": 3.3583215428853666,
3024
+ "learning_rate": 3.325538379211901e-09,
3025
+ "loss": 1.0554,
3026
+ "step": 2150
3027
+ },
3028
+ {
3029
+ "epoch": 0.9949215143120961,
3030
+ "grad_norm": 4.075445923312751,
3031
+ "learning_rate": 1.5718824453525572e-09,
3032
+ "loss": 1.1222,
3033
+ "step": 2155
3034
+ },
3035
+ {
3036
+ "epoch": 0.997229916897507,
3037
+ "grad_norm": 3.3599147688364903,
3038
+ "learning_rate": 4.676761114941197e-10,
3039
+ "loss": 1.0646,
3040
+ "step": 2160
3041
+ },
3042
+ {
3043
+ "epoch": 0.9995383194829178,
3044
+ "grad_norm": 3.4496692841727543,
3045
+ "learning_rate": 1.2991101545622998e-11,
3046
+ "loss": 1.1038,
3047
+ "step": 2165
3048
+ },
3049
+ {
3050
+ "epoch": 1.0,
3051
+ "eval_loss": 1.1177629232406616,
3052
+ "eval_runtime": 1154.8442,
3053
+ "eval_samples_per_second": 26.579,
3054
+ "eval_steps_per_second": 0.831,
3055
+ "step": 2166
3056
+ },
3057
+ {
3058
+ "epoch": 1.0,
3059
+ "step": 2166,
3060
+ "total_flos": 113379083550720.0,
3061
+ "train_loss": 1.171416565762112,
3062
+ "train_runtime": 11018.7418,
3063
+ "train_samples_per_second": 6.29,
3064
+ "train_steps_per_second": 0.197
3065
+ }
3066
+ ],
3067
+ "logging_steps": 5,
3068
+ "max_steps": 2166,
3069
+ "num_input_tokens_seen": 0,
3070
+ "num_train_epochs": 1,
3071
+ "save_steps": 500,
3072
+ "stateful_callbacks": {
3073
+ "TrainerControl": {
3074
+ "args": {
3075
+ "should_epoch_stop": false,
3076
+ "should_evaluate": false,
3077
+ "should_log": false,
3078
+ "should_save": true,
3079
+ "should_training_stop": true
3080
+ },
3081
+ "attributes": {}
3082
+ }
3083
+ },
3084
+ "total_flos": 113379083550720.0,
3085
+ "train_batch_size": 8,
3086
+ "trial_name": null,
3087
+ "trial_params": null
3088
+ }