Text Classification
Transformers
Safetensors
English
distilbert
Safety
Content Moderation
Hate Speech Detection
Toxicity Detection
FlameF0X commited on
Commit
a572ebe
·
verified ·
1 Parent(s): 2f29350

Upload folder using huggingface_hub

Browse files
checkpoint-3000/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.3,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "problem_type": "single_label_classification",
18
+ "qa_dropout": 0.1,
19
+ "seq_classif_dropout": 0.2,
20
+ "sinusoidal_pos_embds": false,
21
+ "tie_weights_": true,
22
+ "transformers_version": "4.57.1",
23
+ "vocab_size": 30522
24
+ }
checkpoint-3000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a8eaf4876f4b852e5653d304488e78b7bae8529e8ea4dd2b83e0ca6b30e96e0
3
+ size 267832560
checkpoint-3000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da2b0450e6a5519243581f5b565a53ee6b7973e0ac487ae19b0fd5a00438d3fe
3
+ size 535727290
checkpoint-3000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:053d6cdb71ddd1d4ac72326fefec638123d638f02de75d73320dddafa3a15da1
3
+ size 14244
checkpoint-3000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16e807fd5448ed5db845d1bcf75950216827854b995e5193797c25bd4b557946
3
+ size 1064
checkpoint-3000/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-3000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3000/tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
checkpoint-3000/trainer_state.json ADDED
@@ -0,0 +1,2134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 1.7231476163124642,
6
+ "eval_steps": 100,
7
+ "global_step": 3000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.005743825387708214,
14
+ "grad_norm": 1.664604902267456,
15
+ "learning_rate": 1.9948305571510626e-05,
16
+ "loss": 0.6766,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.011487650775416428,
21
+ "grad_norm": 1.8649318218231201,
22
+ "learning_rate": 1.9890867317633546e-05,
23
+ "loss": 0.6443,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.01723147616312464,
28
+ "grad_norm": 2.0676939487457275,
29
+ "learning_rate": 1.9833429063756463e-05,
30
+ "loss": 0.6516,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.022975301550832855,
35
+ "grad_norm": 2.077414035797119,
36
+ "learning_rate": 1.9775990809879383e-05,
37
+ "loss": 0.5468,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.02871912693854107,
42
+ "grad_norm": 2.4310338497161865,
43
+ "learning_rate": 1.97185525560023e-05,
44
+ "loss": 0.6289,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.03446295232624928,
49
+ "grad_norm": 2.605480670928955,
50
+ "learning_rate": 1.9661114302125216e-05,
51
+ "loss": 0.583,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.040206777713957496,
56
+ "grad_norm": 2.7097268104553223,
57
+ "learning_rate": 1.9603676048248136e-05,
58
+ "loss": 0.6029,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.04595060310166571,
63
+ "grad_norm": 4.125054359436035,
64
+ "learning_rate": 1.9546237794371053e-05,
65
+ "loss": 0.5545,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.051694428489373924,
70
+ "grad_norm": 4.592983245849609,
71
+ "learning_rate": 1.948879954049397e-05,
72
+ "loss": 0.5027,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.05743825387708214,
77
+ "grad_norm": 4.799990653991699,
78
+ "learning_rate": 1.9431361286616886e-05,
79
+ "loss": 0.4913,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.06318207926479034,
84
+ "grad_norm": 5.146159648895264,
85
+ "learning_rate": 1.9373923032739806e-05,
86
+ "loss": 0.5419,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.06892590465249857,
91
+ "grad_norm": 6.291711807250977,
92
+ "learning_rate": 1.9316484778862726e-05,
93
+ "loss": 0.478,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.07466973004020677,
98
+ "grad_norm": 4.076691627502441,
99
+ "learning_rate": 1.9259046524985643e-05,
100
+ "loss": 0.489,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.08041355542791499,
105
+ "grad_norm": 5.057150363922119,
106
+ "learning_rate": 1.920160827110856e-05,
107
+ "loss": 0.5435,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.0861573808156232,
112
+ "grad_norm": 3.5335211753845215,
113
+ "learning_rate": 1.914417001723148e-05,
114
+ "loss": 0.4445,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.09190120620333142,
119
+ "grad_norm": 6.209961414337158,
120
+ "learning_rate": 1.9086731763354396e-05,
121
+ "loss": 0.4516,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.09764503159103963,
126
+ "grad_norm": 7.1031341552734375,
127
+ "learning_rate": 1.9029293509477313e-05,
128
+ "loss": 0.576,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.10338885697874785,
133
+ "grad_norm": 4.959975719451904,
134
+ "learning_rate": 1.8971855255600233e-05,
135
+ "loss": 0.4841,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.10913268236645605,
140
+ "grad_norm": 4.904316425323486,
141
+ "learning_rate": 1.891441700172315e-05,
142
+ "loss": 0.5007,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.11487650775416428,
147
+ "grad_norm": 8.833961486816406,
148
+ "learning_rate": 1.8856978747846066e-05,
149
+ "loss": 0.5416,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.12062033314187248,
154
+ "grad_norm": 6.040881156921387,
155
+ "learning_rate": 1.8799540493968982e-05,
156
+ "loss": 0.5162,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.1263641585295807,
161
+ "grad_norm": 7.167252063751221,
162
+ "learning_rate": 1.8742102240091902e-05,
163
+ "loss": 0.4965,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.13210798391728892,
168
+ "grad_norm": 3.9503183364868164,
169
+ "learning_rate": 1.8684663986214822e-05,
170
+ "loss": 0.5129,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.13785180930499713,
175
+ "grad_norm": 4.403937339782715,
176
+ "learning_rate": 1.862722573233774e-05,
177
+ "loss": 0.5192,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.14359563469270534,
182
+ "grad_norm": 6.813930034637451,
183
+ "learning_rate": 1.8569787478460656e-05,
184
+ "loss": 0.4927,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.14933946008041354,
189
+ "grad_norm": 4.355352878570557,
190
+ "learning_rate": 1.8512349224583576e-05,
191
+ "loss": 0.4857,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.15508328546812178,
196
+ "grad_norm": 4.012150287628174,
197
+ "learning_rate": 1.8454910970706492e-05,
198
+ "loss": 0.3763,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.16082711085582999,
203
+ "grad_norm": 7.179994106292725,
204
+ "learning_rate": 1.839747271682941e-05,
205
+ "loss": 0.4625,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.1665709362435382,
210
+ "grad_norm": 3.701215982437134,
211
+ "learning_rate": 1.834003446295233e-05,
212
+ "loss": 0.4954,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.1723147616312464,
217
+ "grad_norm": 5.150369167327881,
218
+ "learning_rate": 1.8282596209075246e-05,
219
+ "loss": 0.4875,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.17805858701895463,
224
+ "grad_norm": 4.432060241699219,
225
+ "learning_rate": 1.8225157955198162e-05,
226
+ "loss": 0.5032,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.18380241240666284,
231
+ "grad_norm": 4.560088634490967,
232
+ "learning_rate": 1.816771970132108e-05,
233
+ "loss": 0.4443,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.18954623779437105,
238
+ "grad_norm": 6.000060558319092,
239
+ "learning_rate": 1.8110281447444e-05,
240
+ "loss": 0.4805,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.19529006318207925,
245
+ "grad_norm": 6.462589740753174,
246
+ "learning_rate": 1.805284319356692e-05,
247
+ "loss": 0.4009,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.2010338885697875,
252
+ "grad_norm": 4.6327104568481445,
253
+ "learning_rate": 1.7995404939689835e-05,
254
+ "loss": 0.5126,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.2067777139574957,
259
+ "grad_norm": 5.640307426452637,
260
+ "learning_rate": 1.7937966685812752e-05,
261
+ "loss": 0.4308,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.2125215393452039,
266
+ "grad_norm": 6.553674697875977,
267
+ "learning_rate": 1.7880528431935672e-05,
268
+ "loss": 0.4686,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.2182653647329121,
273
+ "grad_norm": 6.482476234436035,
274
+ "learning_rate": 1.782309017805859e-05,
275
+ "loss": 0.4214,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.22400919012062034,
280
+ "grad_norm": 6.897400379180908,
281
+ "learning_rate": 1.7765651924181505e-05,
282
+ "loss": 0.4827,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.22975301550832855,
287
+ "grad_norm": 3.935074806213379,
288
+ "learning_rate": 1.7708213670304425e-05,
289
+ "loss": 0.4212,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.23549684089603676,
294
+ "grad_norm": 5.766780853271484,
295
+ "learning_rate": 1.7650775416427342e-05,
296
+ "loss": 0.4421,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.24124066628374496,
301
+ "grad_norm": 8.834909439086914,
302
+ "learning_rate": 1.759333716255026e-05,
303
+ "loss": 0.4676,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.2469844916714532,
308
+ "grad_norm": 4.853402614593506,
309
+ "learning_rate": 1.7535898908673175e-05,
310
+ "loss": 0.4888,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.2527283170591614,
315
+ "grad_norm": 5.489238739013672,
316
+ "learning_rate": 1.7478460654796095e-05,
317
+ "loss": 0.3763,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.2584721424468696,
322
+ "grad_norm": 3.805001974105835,
323
+ "learning_rate": 1.7421022400919015e-05,
324
+ "loss": 0.5266,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.26421596783457785,
329
+ "grad_norm": 5.9350080490112305,
330
+ "learning_rate": 1.7363584147041932e-05,
331
+ "loss": 0.4053,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.269959793222286,
336
+ "grad_norm": 8.31826400756836,
337
+ "learning_rate": 1.730614589316485e-05,
338
+ "loss": 0.4775,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.27570361860999426,
343
+ "grad_norm": 5.709866046905518,
344
+ "learning_rate": 1.724870763928777e-05,
345
+ "loss": 0.3681,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.2814474439977025,
350
+ "grad_norm": 2.6657683849334717,
351
+ "learning_rate": 1.7191269385410685e-05,
352
+ "loss": 0.3913,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.2871912693854107,
357
+ "grad_norm": 8.942373275756836,
358
+ "learning_rate": 1.71338311315336e-05,
359
+ "loss": 0.4454,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.2929350947731189,
364
+ "grad_norm": 7.566854476928711,
365
+ "learning_rate": 1.707639287765652e-05,
366
+ "loss": 0.4704,
367
+ "step": 510
368
+ },
369
+ {
370
+ "epoch": 0.2986789201608271,
371
+ "grad_norm": 5.819560527801514,
372
+ "learning_rate": 1.7018954623779438e-05,
373
+ "loss": 0.4245,
374
+ "step": 520
375
+ },
376
+ {
377
+ "epoch": 0.3044227455485353,
378
+ "grad_norm": 7.269554138183594,
379
+ "learning_rate": 1.6961516369902355e-05,
380
+ "loss": 0.523,
381
+ "step": 530
382
+ },
383
+ {
384
+ "epoch": 0.31016657093624356,
385
+ "grad_norm": 4.569669246673584,
386
+ "learning_rate": 1.690407811602527e-05,
387
+ "loss": 0.5053,
388
+ "step": 540
389
+ },
390
+ {
391
+ "epoch": 0.31591039632395174,
392
+ "grad_norm": 6.8373332023620605,
393
+ "learning_rate": 1.684663986214819e-05,
394
+ "loss": 0.4592,
395
+ "step": 550
396
+ },
397
+ {
398
+ "epoch": 0.32165422171165997,
399
+ "grad_norm": 7.736797332763672,
400
+ "learning_rate": 1.678920160827111e-05,
401
+ "loss": 0.4756,
402
+ "step": 560
403
+ },
404
+ {
405
+ "epoch": 0.3273980470993682,
406
+ "grad_norm": 5.857370853424072,
407
+ "learning_rate": 1.6731763354394028e-05,
408
+ "loss": 0.4906,
409
+ "step": 570
410
+ },
411
+ {
412
+ "epoch": 0.3331418724870764,
413
+ "grad_norm": 4.2734785079956055,
414
+ "learning_rate": 1.6674325100516945e-05,
415
+ "loss": 0.4623,
416
+ "step": 580
417
+ },
418
+ {
419
+ "epoch": 0.3388856978747846,
420
+ "grad_norm": 7.859753131866455,
421
+ "learning_rate": 1.6616886846639865e-05,
422
+ "loss": 0.4568,
423
+ "step": 590
424
+ },
425
+ {
426
+ "epoch": 0.3446295232624928,
427
+ "grad_norm": 4.936352252960205,
428
+ "learning_rate": 1.655944859276278e-05,
429
+ "loss": 0.3774,
430
+ "step": 600
431
+ },
432
+ {
433
+ "epoch": 0.35037334865020103,
434
+ "grad_norm": 7.164617538452148,
435
+ "learning_rate": 1.6502010338885698e-05,
436
+ "loss": 0.5373,
437
+ "step": 610
438
+ },
439
+ {
440
+ "epoch": 0.35611717403790927,
441
+ "grad_norm": 7.279328346252441,
442
+ "learning_rate": 1.6444572085008618e-05,
443
+ "loss": 0.4183,
444
+ "step": 620
445
+ },
446
+ {
447
+ "epoch": 0.36186099942561745,
448
+ "grad_norm": 8.805562973022461,
449
+ "learning_rate": 1.6387133831131535e-05,
450
+ "loss": 0.4232,
451
+ "step": 630
452
+ },
453
+ {
454
+ "epoch": 0.3676048248133257,
455
+ "grad_norm": 6.055676460266113,
456
+ "learning_rate": 1.632969557725445e-05,
457
+ "loss": 0.4316,
458
+ "step": 640
459
+ },
460
+ {
461
+ "epoch": 0.3733486502010339,
462
+ "grad_norm": 5.835373401641846,
463
+ "learning_rate": 1.627225732337737e-05,
464
+ "loss": 0.4573,
465
+ "step": 650
466
+ },
467
+ {
468
+ "epoch": 0.3790924755887421,
469
+ "grad_norm": 2.7914931774139404,
470
+ "learning_rate": 1.6214819069500288e-05,
471
+ "loss": 0.3691,
472
+ "step": 660
473
+ },
474
+ {
475
+ "epoch": 0.38483630097645033,
476
+ "grad_norm": 5.780324459075928,
477
+ "learning_rate": 1.6157380815623208e-05,
478
+ "loss": 0.3761,
479
+ "step": 670
480
+ },
481
+ {
482
+ "epoch": 0.3905801263641585,
483
+ "grad_norm": 5.7374348640441895,
484
+ "learning_rate": 1.6099942561746125e-05,
485
+ "loss": 0.4276,
486
+ "step": 680
487
+ },
488
+ {
489
+ "epoch": 0.39632395175186674,
490
+ "grad_norm": 4.771572589874268,
491
+ "learning_rate": 1.604250430786904e-05,
492
+ "loss": 0.4092,
493
+ "step": 690
494
+ },
495
+ {
496
+ "epoch": 0.402067777139575,
497
+ "grad_norm": 7.6168928146362305,
498
+ "learning_rate": 1.598506605399196e-05,
499
+ "loss": 0.4651,
500
+ "step": 700
501
+ },
502
+ {
503
+ "epoch": 0.40781160252728316,
504
+ "grad_norm": 6.272294998168945,
505
+ "learning_rate": 1.5927627800114878e-05,
506
+ "loss": 0.4107,
507
+ "step": 710
508
+ },
509
+ {
510
+ "epoch": 0.4135554279149914,
511
+ "grad_norm": 8.060046195983887,
512
+ "learning_rate": 1.5870189546237794e-05,
513
+ "loss": 0.4117,
514
+ "step": 720
515
+ },
516
+ {
517
+ "epoch": 0.41929925330269957,
518
+ "grad_norm": 7.290645122528076,
519
+ "learning_rate": 1.5812751292360714e-05,
520
+ "loss": 0.5005,
521
+ "step": 730
522
+ },
523
+ {
524
+ "epoch": 0.4250430786904078,
525
+ "grad_norm": 8.02639102935791,
526
+ "learning_rate": 1.575531303848363e-05,
527
+ "loss": 0.4561,
528
+ "step": 740
529
+ },
530
+ {
531
+ "epoch": 0.43078690407811604,
532
+ "grad_norm": 8.455704689025879,
533
+ "learning_rate": 1.5697874784606548e-05,
534
+ "loss": 0.4592,
535
+ "step": 750
536
+ },
537
+ {
538
+ "epoch": 0.4365307294658242,
539
+ "grad_norm": 2.9264323711395264,
540
+ "learning_rate": 1.5640436530729468e-05,
541
+ "loss": 0.4532,
542
+ "step": 760
543
+ },
544
+ {
545
+ "epoch": 0.44227455485353245,
546
+ "grad_norm": 5.246438980102539,
547
+ "learning_rate": 1.5582998276852384e-05,
548
+ "loss": 0.4931,
549
+ "step": 770
550
+ },
551
+ {
552
+ "epoch": 0.4480183802412407,
553
+ "grad_norm": 3.869628667831421,
554
+ "learning_rate": 1.5525560022975304e-05,
555
+ "loss": 0.3476,
556
+ "step": 780
557
+ },
558
+ {
559
+ "epoch": 0.45376220562894887,
560
+ "grad_norm": 4.742978572845459,
561
+ "learning_rate": 1.546812176909822e-05,
562
+ "loss": 0.4274,
563
+ "step": 790
564
+ },
565
+ {
566
+ "epoch": 0.4595060310166571,
567
+ "grad_norm": 6.9326300621032715,
568
+ "learning_rate": 1.5410683515221138e-05,
569
+ "loss": 0.5299,
570
+ "step": 800
571
+ },
572
+ {
573
+ "epoch": 0.4652498564043653,
574
+ "grad_norm": 6.16196346282959,
575
+ "learning_rate": 1.5353245261344058e-05,
576
+ "loss": 0.4805,
577
+ "step": 810
578
+ },
579
+ {
580
+ "epoch": 0.4709936817920735,
581
+ "grad_norm": 3.637753486633301,
582
+ "learning_rate": 1.5295807007466974e-05,
583
+ "loss": 0.4393,
584
+ "step": 820
585
+ },
586
+ {
587
+ "epoch": 0.47673750717978175,
588
+ "grad_norm": 3.867966413497925,
589
+ "learning_rate": 1.523836875358989e-05,
590
+ "loss": 0.4358,
591
+ "step": 830
592
+ },
593
+ {
594
+ "epoch": 0.48248133256748993,
595
+ "grad_norm": 4.753304481506348,
596
+ "learning_rate": 1.5180930499712809e-05,
597
+ "loss": 0.4737,
598
+ "step": 840
599
+ },
600
+ {
601
+ "epoch": 0.48822515795519816,
602
+ "grad_norm": 7.7434515953063965,
603
+ "learning_rate": 1.5123492245835727e-05,
604
+ "loss": 0.4466,
605
+ "step": 850
606
+ },
607
+ {
608
+ "epoch": 0.4939689833429064,
609
+ "grad_norm": 6.001241683959961,
610
+ "learning_rate": 1.5066053991958644e-05,
611
+ "loss": 0.4016,
612
+ "step": 860
613
+ },
614
+ {
615
+ "epoch": 0.4997128087306146,
616
+ "grad_norm": 7.710599422454834,
617
+ "learning_rate": 1.5008615738081564e-05,
618
+ "loss": 0.4758,
619
+ "step": 870
620
+ },
621
+ {
622
+ "epoch": 0.5054566341183228,
623
+ "grad_norm": 5.139091491699219,
624
+ "learning_rate": 1.4951177484204482e-05,
625
+ "loss": 0.4019,
626
+ "step": 880
627
+ },
628
+ {
629
+ "epoch": 0.511200459506031,
630
+ "grad_norm": 9.428766250610352,
631
+ "learning_rate": 1.4893739230327399e-05,
632
+ "loss": 0.3742,
633
+ "step": 890
634
+ },
635
+ {
636
+ "epoch": 0.5169442848937392,
637
+ "grad_norm": 9.071430206298828,
638
+ "learning_rate": 1.4836300976450317e-05,
639
+ "loss": 0.4183,
640
+ "step": 900
641
+ },
642
+ {
643
+ "epoch": 0.5226881102814475,
644
+ "grad_norm": 10.34457015991211,
645
+ "learning_rate": 1.4778862722573236e-05,
646
+ "loss": 0.437,
647
+ "step": 910
648
+ },
649
+ {
650
+ "epoch": 0.5284319356691557,
651
+ "grad_norm": 7.01099967956543,
652
+ "learning_rate": 1.4721424468696152e-05,
653
+ "loss": 0.4421,
654
+ "step": 920
655
+ },
656
+ {
657
+ "epoch": 0.5341757610568638,
658
+ "grad_norm": 3.454484701156616,
659
+ "learning_rate": 1.466398621481907e-05,
660
+ "loss": 0.4283,
661
+ "step": 930
662
+ },
663
+ {
664
+ "epoch": 0.539919586444572,
665
+ "grad_norm": 5.86787223815918,
666
+ "learning_rate": 1.4606547960941987e-05,
667
+ "loss": 0.4094,
668
+ "step": 940
669
+ },
670
+ {
671
+ "epoch": 0.5456634118322803,
672
+ "grad_norm": 8.41640567779541,
673
+ "learning_rate": 1.4549109707064906e-05,
674
+ "loss": 0.414,
675
+ "step": 950
676
+ },
677
+ {
678
+ "epoch": 0.5514072372199885,
679
+ "grad_norm": 6.936724662780762,
680
+ "learning_rate": 1.4491671453187824e-05,
681
+ "loss": 0.3936,
682
+ "step": 960
683
+ },
684
+ {
685
+ "epoch": 0.5571510626076968,
686
+ "grad_norm": 7.984805583953857,
687
+ "learning_rate": 1.4434233199310744e-05,
688
+ "loss": 0.4071,
689
+ "step": 970
690
+ },
691
+ {
692
+ "epoch": 0.562894887995405,
693
+ "grad_norm": 4.381742477416992,
694
+ "learning_rate": 1.437679494543366e-05,
695
+ "loss": 0.3393,
696
+ "step": 980
697
+ },
698
+ {
699
+ "epoch": 0.5686387133831131,
700
+ "grad_norm": 7.115252494812012,
701
+ "learning_rate": 1.4319356691556579e-05,
702
+ "loss": 0.5616,
703
+ "step": 990
704
+ },
705
+ {
706
+ "epoch": 0.5743825387708213,
707
+ "grad_norm": 6.60570764541626,
708
+ "learning_rate": 1.4261918437679495e-05,
709
+ "loss": 0.4633,
710
+ "step": 1000
711
+ },
712
+ {
713
+ "epoch": 0.5801263641585296,
714
+ "grad_norm": 9.552061080932617,
715
+ "learning_rate": 1.4204480183802414e-05,
716
+ "loss": 0.4973,
717
+ "step": 1010
718
+ },
719
+ {
720
+ "epoch": 0.5858701895462378,
721
+ "grad_norm": 3.6467580795288086,
722
+ "learning_rate": 1.4147041929925332e-05,
723
+ "loss": 0.3958,
724
+ "step": 1020
725
+ },
726
+ {
727
+ "epoch": 0.591614014933946,
728
+ "grad_norm": 2.663799524307251,
729
+ "learning_rate": 1.4089603676048249e-05,
730
+ "loss": 0.3402,
731
+ "step": 1030
732
+ },
733
+ {
734
+ "epoch": 0.5973578403216542,
735
+ "grad_norm": 7.468900203704834,
736
+ "learning_rate": 1.4032165422171167e-05,
737
+ "loss": 0.3341,
738
+ "step": 1040
739
+ },
740
+ {
741
+ "epoch": 0.6031016657093624,
742
+ "grad_norm": 10.396268844604492,
743
+ "learning_rate": 1.3974727168294084e-05,
744
+ "loss": 0.5122,
745
+ "step": 1050
746
+ },
747
+ {
748
+ "epoch": 0.6088454910970706,
749
+ "grad_norm": 7.79450798034668,
750
+ "learning_rate": 1.3917288914417002e-05,
751
+ "loss": 0.5581,
752
+ "step": 1060
753
+ },
754
+ {
755
+ "epoch": 0.6145893164847789,
756
+ "grad_norm": 8.077397346496582,
757
+ "learning_rate": 1.385985066053992e-05,
758
+ "loss": 0.4247,
759
+ "step": 1070
760
+ },
761
+ {
762
+ "epoch": 0.6203331418724871,
763
+ "grad_norm": 8.327542304992676,
764
+ "learning_rate": 1.380241240666284e-05,
765
+ "loss": 0.4122,
766
+ "step": 1080
767
+ },
768
+ {
769
+ "epoch": 0.6260769672601952,
770
+ "grad_norm": 7.940774917602539,
771
+ "learning_rate": 1.3744974152785757e-05,
772
+ "loss": 0.5199,
773
+ "step": 1090
774
+ },
775
+ {
776
+ "epoch": 0.6318207926479035,
777
+ "grad_norm": 5.148271560668945,
778
+ "learning_rate": 1.3687535898908675e-05,
779
+ "loss": 0.4534,
780
+ "step": 1100
781
+ },
782
+ {
783
+ "epoch": 0.6375646180356117,
784
+ "grad_norm": 7.042996883392334,
785
+ "learning_rate": 1.3630097645031592e-05,
786
+ "loss": 0.4737,
787
+ "step": 1110
788
+ },
789
+ {
790
+ "epoch": 0.6433084434233199,
791
+ "grad_norm": 5.131284236907959,
792
+ "learning_rate": 1.357265939115451e-05,
793
+ "loss": 0.3637,
794
+ "step": 1120
795
+ },
796
+ {
797
+ "epoch": 0.6490522688110282,
798
+ "grad_norm": 10.73865795135498,
799
+ "learning_rate": 1.3515221137277428e-05,
800
+ "loss": 0.4152,
801
+ "step": 1130
802
+ },
803
+ {
804
+ "epoch": 0.6547960941987364,
805
+ "grad_norm": 6.061016082763672,
806
+ "learning_rate": 1.3457782883400345e-05,
807
+ "loss": 0.343,
808
+ "step": 1140
809
+ },
810
+ {
811
+ "epoch": 0.6605399195864445,
812
+ "grad_norm": 11.200201988220215,
813
+ "learning_rate": 1.3400344629523263e-05,
814
+ "loss": 0.4781,
815
+ "step": 1150
816
+ },
817
+ {
818
+ "epoch": 0.6662837449741528,
819
+ "grad_norm": 6.987539768218994,
820
+ "learning_rate": 1.334290637564618e-05,
821
+ "loss": 0.4046,
822
+ "step": 1160
823
+ },
824
+ {
825
+ "epoch": 0.672027570361861,
826
+ "grad_norm": 7.3787713050842285,
827
+ "learning_rate": 1.3285468121769098e-05,
828
+ "loss": 0.4136,
829
+ "step": 1170
830
+ },
831
+ {
832
+ "epoch": 0.6777713957495692,
833
+ "grad_norm": 8.829427719116211,
834
+ "learning_rate": 1.3228029867892018e-05,
835
+ "loss": 0.3807,
836
+ "step": 1180
837
+ },
838
+ {
839
+ "epoch": 0.6835152211372775,
840
+ "grad_norm": 9.648842811584473,
841
+ "learning_rate": 1.3170591614014937e-05,
842
+ "loss": 0.3273,
843
+ "step": 1190
844
+ },
845
+ {
846
+ "epoch": 0.6892590465249856,
847
+ "grad_norm": 7.307587146759033,
848
+ "learning_rate": 1.3113153360137853e-05,
849
+ "loss": 0.3351,
850
+ "step": 1200
851
+ },
852
+ {
853
+ "epoch": 0.6950028719126938,
854
+ "grad_norm": 6.445584297180176,
855
+ "learning_rate": 1.3055715106260772e-05,
856
+ "loss": 0.4776,
857
+ "step": 1210
858
+ },
859
+ {
860
+ "epoch": 0.7007466973004021,
861
+ "grad_norm": 7.078349590301514,
862
+ "learning_rate": 1.2998276852383688e-05,
863
+ "loss": 0.4438,
864
+ "step": 1220
865
+ },
866
+ {
867
+ "epoch": 0.7064905226881103,
868
+ "grad_norm": 9.63571834564209,
869
+ "learning_rate": 1.2940838598506606e-05,
870
+ "loss": 0.4027,
871
+ "step": 1230
872
+ },
873
+ {
874
+ "epoch": 0.7122343480758185,
875
+ "grad_norm": 3.2861080169677734,
876
+ "learning_rate": 1.2883400344629525e-05,
877
+ "loss": 0.3334,
878
+ "step": 1240
879
+ },
880
+ {
881
+ "epoch": 0.7179781734635267,
882
+ "grad_norm": 7.433917999267578,
883
+ "learning_rate": 1.2825962090752441e-05,
884
+ "loss": 0.4418,
885
+ "step": 1250
886
+ },
887
+ {
888
+ "epoch": 0.7237219988512349,
889
+ "grad_norm": 7.511765003204346,
890
+ "learning_rate": 1.276852383687536e-05,
891
+ "loss": 0.3693,
892
+ "step": 1260
893
+ },
894
+ {
895
+ "epoch": 0.7294658242389431,
896
+ "grad_norm": 9.38160228729248,
897
+ "learning_rate": 1.2711085582998276e-05,
898
+ "loss": 0.3989,
899
+ "step": 1270
900
+ },
901
+ {
902
+ "epoch": 0.7352096496266514,
903
+ "grad_norm": 2.012756586074829,
904
+ "learning_rate": 1.2653647329121195e-05,
905
+ "loss": 0.3867,
906
+ "step": 1280
907
+ },
908
+ {
909
+ "epoch": 0.7409534750143596,
910
+ "grad_norm": 6.777096748352051,
911
+ "learning_rate": 1.2596209075244115e-05,
912
+ "loss": 0.5528,
913
+ "step": 1290
914
+ },
915
+ {
916
+ "epoch": 0.7466973004020678,
917
+ "grad_norm": 6.928145885467529,
918
+ "learning_rate": 1.2538770821367033e-05,
919
+ "loss": 0.3403,
920
+ "step": 1300
921
+ },
922
+ {
923
+ "epoch": 0.752441125789776,
924
+ "grad_norm": 7.99967622756958,
925
+ "learning_rate": 1.248133256748995e-05,
926
+ "loss": 0.5006,
927
+ "step": 1310
928
+ },
929
+ {
930
+ "epoch": 0.7581849511774842,
931
+ "grad_norm": 4.8364033699035645,
932
+ "learning_rate": 1.2423894313612868e-05,
933
+ "loss": 0.4266,
934
+ "step": 1320
935
+ },
936
+ {
937
+ "epoch": 0.7639287765651924,
938
+ "grad_norm": 4.017831802368164,
939
+ "learning_rate": 1.2366456059735785e-05,
940
+ "loss": 0.3444,
941
+ "step": 1330
942
+ },
943
+ {
944
+ "epoch": 0.7696726019529007,
945
+ "grad_norm": 5.27449893951416,
946
+ "learning_rate": 1.2309017805858703e-05,
947
+ "loss": 0.3962,
948
+ "step": 1340
949
+ },
950
+ {
951
+ "epoch": 0.7754164273406089,
952
+ "grad_norm": 7.989853858947754,
953
+ "learning_rate": 1.2251579551981621e-05,
954
+ "loss": 0.4172,
955
+ "step": 1350
956
+ },
957
+ {
958
+ "epoch": 0.781160252728317,
959
+ "grad_norm": 5.878440856933594,
960
+ "learning_rate": 1.2194141298104538e-05,
961
+ "loss": 0.4723,
962
+ "step": 1360
963
+ },
964
+ {
965
+ "epoch": 0.7869040781160253,
966
+ "grad_norm": 9.140411376953125,
967
+ "learning_rate": 1.2136703044227456e-05,
968
+ "loss": 0.4012,
969
+ "step": 1370
970
+ },
971
+ {
972
+ "epoch": 0.7926479035037335,
973
+ "grad_norm": 3.9119012355804443,
974
+ "learning_rate": 1.2079264790350373e-05,
975
+ "loss": 0.3545,
976
+ "step": 1380
977
+ },
978
+ {
979
+ "epoch": 0.7983917288914417,
980
+ "grad_norm": 6.248933792114258,
981
+ "learning_rate": 1.2021826536473291e-05,
982
+ "loss": 0.4787,
983
+ "step": 1390
984
+ },
985
+ {
986
+ "epoch": 0.80413555427915,
987
+ "grad_norm": 8.478959083557129,
988
+ "learning_rate": 1.1964388282596211e-05,
989
+ "loss": 0.4077,
990
+ "step": 1400
991
+ },
992
+ {
993
+ "epoch": 0.8098793796668581,
994
+ "grad_norm": 6.234384059906006,
995
+ "learning_rate": 1.190695002871913e-05,
996
+ "loss": 0.4044,
997
+ "step": 1410
998
+ },
999
+ {
1000
+ "epoch": 0.8156232050545663,
1001
+ "grad_norm": 5.093031883239746,
1002
+ "learning_rate": 1.1849511774842046e-05,
1003
+ "loss": 0.3298,
1004
+ "step": 1420
1005
+ },
1006
+ {
1007
+ "epoch": 0.8213670304422745,
1008
+ "grad_norm": 5.755350112915039,
1009
+ "learning_rate": 1.1792073520964964e-05,
1010
+ "loss": 0.4099,
1011
+ "step": 1430
1012
+ },
1013
+ {
1014
+ "epoch": 0.8271108558299828,
1015
+ "grad_norm": 9.269704818725586,
1016
+ "learning_rate": 1.1734635267087881e-05,
1017
+ "loss": 0.366,
1018
+ "step": 1440
1019
+ },
1020
+ {
1021
+ "epoch": 0.832854681217691,
1022
+ "grad_norm": 4.977533340454102,
1023
+ "learning_rate": 1.16771970132108e-05,
1024
+ "loss": 0.3156,
1025
+ "step": 1450
1026
+ },
1027
+ {
1028
+ "epoch": 0.8385985066053991,
1029
+ "grad_norm": 6.767063140869141,
1030
+ "learning_rate": 1.1619758759333718e-05,
1031
+ "loss": 0.3966,
1032
+ "step": 1460
1033
+ },
1034
+ {
1035
+ "epoch": 0.8443423319931074,
1036
+ "grad_norm": 6.855627536773682,
1037
+ "learning_rate": 1.1562320505456634e-05,
1038
+ "loss": 0.3488,
1039
+ "step": 1470
1040
+ },
1041
+ {
1042
+ "epoch": 0.8500861573808156,
1043
+ "grad_norm": 3.408679723739624,
1044
+ "learning_rate": 1.1504882251579552e-05,
1045
+ "loss": 0.4016,
1046
+ "step": 1480
1047
+ },
1048
+ {
1049
+ "epoch": 0.8558299827685238,
1050
+ "grad_norm": 8.43376636505127,
1051
+ "learning_rate": 1.1447443997702469e-05,
1052
+ "loss": 0.3951,
1053
+ "step": 1490
1054
+ },
1055
+ {
1056
+ "epoch": 0.8615738081562321,
1057
+ "grad_norm": 7.106573104858398,
1058
+ "learning_rate": 1.1390005743825389e-05,
1059
+ "loss": 0.3056,
1060
+ "step": 1500
1061
+ },
1062
+ {
1063
+ "epoch": 0.8673176335439403,
1064
+ "grad_norm": 3.373734474182129,
1065
+ "learning_rate": 1.1332567489948307e-05,
1066
+ "loss": 0.4548,
1067
+ "step": 1510
1068
+ },
1069
+ {
1070
+ "epoch": 0.8730614589316484,
1071
+ "grad_norm": 4.657841205596924,
1072
+ "learning_rate": 1.1275129236071226e-05,
1073
+ "loss": 0.36,
1074
+ "step": 1520
1075
+ },
1076
+ {
1077
+ "epoch": 0.8788052843193567,
1078
+ "grad_norm": 8.218329429626465,
1079
+ "learning_rate": 1.1217690982194142e-05,
1080
+ "loss": 0.4297,
1081
+ "step": 1530
1082
+ },
1083
+ {
1084
+ "epoch": 0.8845491097070649,
1085
+ "grad_norm": 7.709052562713623,
1086
+ "learning_rate": 1.116025272831706e-05,
1087
+ "loss": 0.3766,
1088
+ "step": 1540
1089
+ },
1090
+ {
1091
+ "epoch": 0.8902929350947731,
1092
+ "grad_norm": 6.875143527984619,
1093
+ "learning_rate": 1.1102814474439977e-05,
1094
+ "loss": 0.4106,
1095
+ "step": 1550
1096
+ },
1097
+ {
1098
+ "epoch": 0.8960367604824814,
1099
+ "grad_norm": 5.460892200469971,
1100
+ "learning_rate": 1.1045376220562896e-05,
1101
+ "loss": 0.3016,
1102
+ "step": 1560
1103
+ },
1104
+ {
1105
+ "epoch": 0.9017805858701895,
1106
+ "grad_norm": 3.4830429553985596,
1107
+ "learning_rate": 1.0987937966685814e-05,
1108
+ "loss": 0.371,
1109
+ "step": 1570
1110
+ },
1111
+ {
1112
+ "epoch": 0.9075244112578977,
1113
+ "grad_norm": 8.233579635620117,
1114
+ "learning_rate": 1.093049971280873e-05,
1115
+ "loss": 0.4591,
1116
+ "step": 1580
1117
+ },
1118
+ {
1119
+ "epoch": 0.913268236645606,
1120
+ "grad_norm": 7.001081466674805,
1121
+ "learning_rate": 1.0873061458931649e-05,
1122
+ "loss": 0.3856,
1123
+ "step": 1590
1124
+ },
1125
+ {
1126
+ "epoch": 0.9190120620333142,
1127
+ "grad_norm": 7.473963260650635,
1128
+ "learning_rate": 1.0815623205054565e-05,
1129
+ "loss": 0.4937,
1130
+ "step": 1600
1131
+ },
1132
+ {
1133
+ "epoch": 0.9247558874210224,
1134
+ "grad_norm": 4.046863079071045,
1135
+ "learning_rate": 1.0758184951177485e-05,
1136
+ "loss": 0.4141,
1137
+ "step": 1610
1138
+ },
1139
+ {
1140
+ "epoch": 0.9304997128087306,
1141
+ "grad_norm": 5.425885200500488,
1142
+ "learning_rate": 1.0700746697300404e-05,
1143
+ "loss": 0.3545,
1144
+ "step": 1620
1145
+ },
1146
+ {
1147
+ "epoch": 0.9362435381964388,
1148
+ "grad_norm": 5.255967140197754,
1149
+ "learning_rate": 1.0643308443423322e-05,
1150
+ "loss": 0.3881,
1151
+ "step": 1630
1152
+ },
1153
+ {
1154
+ "epoch": 0.941987363584147,
1155
+ "grad_norm": 7.365703105926514,
1156
+ "learning_rate": 1.0585870189546239e-05,
1157
+ "loss": 0.3648,
1158
+ "step": 1640
1159
+ },
1160
+ {
1161
+ "epoch": 0.9477311889718553,
1162
+ "grad_norm": 5.149658679962158,
1163
+ "learning_rate": 1.0528431935669157e-05,
1164
+ "loss": 0.2952,
1165
+ "step": 1650
1166
+ },
1167
+ {
1168
+ "epoch": 0.9534750143595635,
1169
+ "grad_norm": 5.5968194007873535,
1170
+ "learning_rate": 1.0470993681792074e-05,
1171
+ "loss": 0.3604,
1172
+ "step": 1660
1173
+ },
1174
+ {
1175
+ "epoch": 0.9592188397472717,
1176
+ "grad_norm": 6.176368713378906,
1177
+ "learning_rate": 1.0413555427914992e-05,
1178
+ "loss": 0.3584,
1179
+ "step": 1670
1180
+ },
1181
+ {
1182
+ "epoch": 0.9649626651349799,
1183
+ "grad_norm": 5.876338958740234,
1184
+ "learning_rate": 1.035611717403791e-05,
1185
+ "loss": 0.4076,
1186
+ "step": 1680
1187
+ },
1188
+ {
1189
+ "epoch": 0.9707064905226881,
1190
+ "grad_norm": 11.697908401489258,
1191
+ "learning_rate": 1.0298678920160827e-05,
1192
+ "loss": 0.3648,
1193
+ "step": 1690
1194
+ },
1195
+ {
1196
+ "epoch": 0.9764503159103963,
1197
+ "grad_norm": 7.04163122177124,
1198
+ "learning_rate": 1.0241240666283745e-05,
1199
+ "loss": 0.349,
1200
+ "step": 1700
1201
+ },
1202
+ {
1203
+ "epoch": 0.9821941412981046,
1204
+ "grad_norm": 6.7707133293151855,
1205
+ "learning_rate": 1.0183802412406662e-05,
1206
+ "loss": 0.3888,
1207
+ "step": 1710
1208
+ },
1209
+ {
1210
+ "epoch": 0.9879379666858128,
1211
+ "grad_norm": 3.8108270168304443,
1212
+ "learning_rate": 1.0126364158529582e-05,
1213
+ "loss": 0.3753,
1214
+ "step": 1720
1215
+ },
1216
+ {
1217
+ "epoch": 0.9936817920735209,
1218
+ "grad_norm": 11.013320922851562,
1219
+ "learning_rate": 1.00689259046525e-05,
1220
+ "loss": 0.3977,
1221
+ "step": 1730
1222
+ },
1223
+ {
1224
+ "epoch": 0.9994256174612292,
1225
+ "grad_norm": 4.042791843414307,
1226
+ "learning_rate": 1.0011487650775419e-05,
1227
+ "loss": 0.3724,
1228
+ "step": 1740
1229
+ },
1230
+ {
1231
+ "epoch": 1.0051694428489375,
1232
+ "grad_norm": 6.258309841156006,
1233
+ "learning_rate": 9.954049396898335e-06,
1234
+ "loss": 0.3591,
1235
+ "step": 1750
1236
+ },
1237
+ {
1238
+ "epoch": 1.0109132682366455,
1239
+ "grad_norm": 7.884782314300537,
1240
+ "learning_rate": 9.896611143021253e-06,
1241
+ "loss": 0.4114,
1242
+ "step": 1760
1243
+ },
1244
+ {
1245
+ "epoch": 1.0166570936243537,
1246
+ "grad_norm": 4.9663567543029785,
1247
+ "learning_rate": 9.83917288914417e-06,
1248
+ "loss": 0.4462,
1249
+ "step": 1770
1250
+ },
1251
+ {
1252
+ "epoch": 1.022400919012062,
1253
+ "grad_norm": 7.046320915222168,
1254
+ "learning_rate": 9.781734635267088e-06,
1255
+ "loss": 0.3386,
1256
+ "step": 1780
1257
+ },
1258
+ {
1259
+ "epoch": 1.0281447443997702,
1260
+ "grad_norm": 6.846945762634277,
1261
+ "learning_rate": 9.724296381390007e-06,
1262
+ "loss": 0.3181,
1263
+ "step": 1790
1264
+ },
1265
+ {
1266
+ "epoch": 1.0338885697874785,
1267
+ "grad_norm": 5.925526142120361,
1268
+ "learning_rate": 9.666858127512925e-06,
1269
+ "loss": 0.3357,
1270
+ "step": 1800
1271
+ },
1272
+ {
1273
+ "epoch": 1.0396323951751867,
1274
+ "grad_norm": 14.302725791931152,
1275
+ "learning_rate": 9.609419873635842e-06,
1276
+ "loss": 0.2859,
1277
+ "step": 1810
1278
+ },
1279
+ {
1280
+ "epoch": 1.045376220562895,
1281
+ "grad_norm": 8.27291488647461,
1282
+ "learning_rate": 9.55198161975876e-06,
1283
+ "loss": 0.3688,
1284
+ "step": 1820
1285
+ },
1286
+ {
1287
+ "epoch": 1.0511200459506032,
1288
+ "grad_norm": 5.266950607299805,
1289
+ "learning_rate": 9.494543365881678e-06,
1290
+ "loss": 0.3106,
1291
+ "step": 1830
1292
+ },
1293
+ {
1294
+ "epoch": 1.0568638713383114,
1295
+ "grad_norm": 11.347005844116211,
1296
+ "learning_rate": 9.437105112004595e-06,
1297
+ "loss": 0.3389,
1298
+ "step": 1840
1299
+ },
1300
+ {
1301
+ "epoch": 1.0626076967260196,
1302
+ "grad_norm": 4.7072906494140625,
1303
+ "learning_rate": 9.379666858127515e-06,
1304
+ "loss": 0.3519,
1305
+ "step": 1850
1306
+ },
1307
+ {
1308
+ "epoch": 1.0683515221137276,
1309
+ "grad_norm": 4.05309534072876,
1310
+ "learning_rate": 9.322228604250432e-06,
1311
+ "loss": 0.3006,
1312
+ "step": 1860
1313
+ },
1314
+ {
1315
+ "epoch": 1.0740953475014359,
1316
+ "grad_norm": 5.578520774841309,
1317
+ "learning_rate": 9.26479035037335e-06,
1318
+ "loss": 0.3645,
1319
+ "step": 1870
1320
+ },
1321
+ {
1322
+ "epoch": 1.079839172889144,
1323
+ "grad_norm": 7.405791282653809,
1324
+ "learning_rate": 9.207352096496266e-06,
1325
+ "loss": 0.3104,
1326
+ "step": 1880
1327
+ },
1328
+ {
1329
+ "epoch": 1.0855829982768523,
1330
+ "grad_norm": 9.269173622131348,
1331
+ "learning_rate": 9.149913842619185e-06,
1332
+ "loss": 0.3576,
1333
+ "step": 1890
1334
+ },
1335
+ {
1336
+ "epoch": 1.0913268236645606,
1337
+ "grad_norm": 5.276297569274902,
1338
+ "learning_rate": 9.092475588742103e-06,
1339
+ "loss": 0.3354,
1340
+ "step": 1900
1341
+ },
1342
+ {
1343
+ "epoch": 1.0970706490522688,
1344
+ "grad_norm": 8.320406913757324,
1345
+ "learning_rate": 9.035037334865021e-06,
1346
+ "loss": 0.3232,
1347
+ "step": 1910
1348
+ },
1349
+ {
1350
+ "epoch": 1.102814474439977,
1351
+ "grad_norm": 6.023215293884277,
1352
+ "learning_rate": 8.977599080987938e-06,
1353
+ "loss": 0.3427,
1354
+ "step": 1920
1355
+ },
1356
+ {
1357
+ "epoch": 1.1085582998276853,
1358
+ "grad_norm": 8.178590774536133,
1359
+ "learning_rate": 8.920160827110856e-06,
1360
+ "loss": 0.3103,
1361
+ "step": 1930
1362
+ },
1363
+ {
1364
+ "epoch": 1.1143021252153935,
1365
+ "grad_norm": 6.056619644165039,
1366
+ "learning_rate": 8.862722573233775e-06,
1367
+ "loss": 0.3848,
1368
+ "step": 1940
1369
+ },
1370
+ {
1371
+ "epoch": 1.1200459506031017,
1372
+ "grad_norm": 6.109485626220703,
1373
+ "learning_rate": 8.805284319356693e-06,
1374
+ "loss": 0.3102,
1375
+ "step": 1950
1376
+ },
1377
+ {
1378
+ "epoch": 1.12578977599081,
1379
+ "grad_norm": 6.949984550476074,
1380
+ "learning_rate": 8.747846065479611e-06,
1381
+ "loss": 0.3013,
1382
+ "step": 1960
1383
+ },
1384
+ {
1385
+ "epoch": 1.1315336013785182,
1386
+ "grad_norm": 4.320880889892578,
1387
+ "learning_rate": 8.690407811602528e-06,
1388
+ "loss": 0.3169,
1389
+ "step": 1970
1390
+ },
1391
+ {
1392
+ "epoch": 1.1372774267662262,
1393
+ "grad_norm": 9.62964916229248,
1394
+ "learning_rate": 8.632969557725446e-06,
1395
+ "loss": 0.3577,
1396
+ "step": 1980
1397
+ },
1398
+ {
1399
+ "epoch": 1.1430212521539345,
1400
+ "grad_norm": 7.1865105628967285,
1401
+ "learning_rate": 8.575531303848363e-06,
1402
+ "loss": 0.4245,
1403
+ "step": 1990
1404
+ },
1405
+ {
1406
+ "epoch": 1.1487650775416427,
1407
+ "grad_norm": 11.42944622039795,
1408
+ "learning_rate": 8.518093049971281e-06,
1409
+ "loss": 0.3421,
1410
+ "step": 2000
1411
+ },
1412
+ {
1413
+ "epoch": 1.154508902929351,
1414
+ "grad_norm": 10.365814208984375,
1415
+ "learning_rate": 8.4606547960942e-06,
1416
+ "loss": 0.2971,
1417
+ "step": 2010
1418
+ },
1419
+ {
1420
+ "epoch": 1.1602527283170592,
1421
+ "grad_norm": 4.546888828277588,
1422
+ "learning_rate": 8.403216542217118e-06,
1423
+ "loss": 0.3762,
1424
+ "step": 2020
1425
+ },
1426
+ {
1427
+ "epoch": 1.1659965537047674,
1428
+ "grad_norm": 9.672823905944824,
1429
+ "learning_rate": 8.345778288340034e-06,
1430
+ "loss": 0.3411,
1431
+ "step": 2030
1432
+ },
1433
+ {
1434
+ "epoch": 1.1717403790924756,
1435
+ "grad_norm": 4.738915920257568,
1436
+ "learning_rate": 8.288340034462953e-06,
1437
+ "loss": 0.3453,
1438
+ "step": 2040
1439
+ },
1440
+ {
1441
+ "epoch": 1.1774842044801839,
1442
+ "grad_norm": 10.187810897827148,
1443
+ "learning_rate": 8.230901780585871e-06,
1444
+ "loss": 0.3351,
1445
+ "step": 2050
1446
+ },
1447
+ {
1448
+ "epoch": 1.183228029867892,
1449
+ "grad_norm": 6.290671348571777,
1450
+ "learning_rate": 8.17346352670879e-06,
1451
+ "loss": 0.3286,
1452
+ "step": 2060
1453
+ },
1454
+ {
1455
+ "epoch": 1.1889718552556001,
1456
+ "grad_norm": 9.14261531829834,
1457
+ "learning_rate": 8.116025272831708e-06,
1458
+ "loss": 0.2878,
1459
+ "step": 2070
1460
+ },
1461
+ {
1462
+ "epoch": 1.1947156806433084,
1463
+ "grad_norm": 7.814758777618408,
1464
+ "learning_rate": 8.058587018954624e-06,
1465
+ "loss": 0.3469,
1466
+ "step": 2080
1467
+ },
1468
+ {
1469
+ "epoch": 1.2004595060310166,
1470
+ "grad_norm": 10.085731506347656,
1471
+ "learning_rate": 8.001148765077543e-06,
1472
+ "loss": 0.3025,
1473
+ "step": 2090
1474
+ },
1475
+ {
1476
+ "epoch": 1.2062033314187248,
1477
+ "grad_norm": 10.734376907348633,
1478
+ "learning_rate": 7.94371051120046e-06,
1479
+ "loss": 0.2442,
1480
+ "step": 2100
1481
+ },
1482
+ {
1483
+ "epoch": 1.211947156806433,
1484
+ "grad_norm": 13.61286735534668,
1485
+ "learning_rate": 7.88627225732338e-06,
1486
+ "loss": 0.3063,
1487
+ "step": 2110
1488
+ },
1489
+ {
1490
+ "epoch": 1.2176909821941413,
1491
+ "grad_norm": 8.572850227355957,
1492
+ "learning_rate": 7.828834003446296e-06,
1493
+ "loss": 0.3863,
1494
+ "step": 2120
1495
+ },
1496
+ {
1497
+ "epoch": 1.2234348075818495,
1498
+ "grad_norm": 6.247170448303223,
1499
+ "learning_rate": 7.771395749569214e-06,
1500
+ "loss": 0.3214,
1501
+ "step": 2130
1502
+ },
1503
+ {
1504
+ "epoch": 1.2291786329695578,
1505
+ "grad_norm": 7.438636779785156,
1506
+ "learning_rate": 7.71395749569213e-06,
1507
+ "loss": 0.381,
1508
+ "step": 2140
1509
+ },
1510
+ {
1511
+ "epoch": 1.234922458357266,
1512
+ "grad_norm": 6.11846399307251,
1513
+ "learning_rate": 7.656519241815049e-06,
1514
+ "loss": 0.3556,
1515
+ "step": 2150
1516
+ },
1517
+ {
1518
+ "epoch": 1.2406662837449742,
1519
+ "grad_norm": 10.697092056274414,
1520
+ "learning_rate": 7.5990809879379666e-06,
1521
+ "loss": 0.3767,
1522
+ "step": 2160
1523
+ },
1524
+ {
1525
+ "epoch": 1.2464101091326825,
1526
+ "grad_norm": 5.3118205070495605,
1527
+ "learning_rate": 7.541642734060886e-06,
1528
+ "loss": 0.3112,
1529
+ "step": 2170
1530
+ },
1531
+ {
1532
+ "epoch": 1.2521539345203907,
1533
+ "grad_norm": 5.907925128936768,
1534
+ "learning_rate": 7.484204480183803e-06,
1535
+ "loss": 0.2833,
1536
+ "step": 2180
1537
+ },
1538
+ {
1539
+ "epoch": 1.2578977599080987,
1540
+ "grad_norm": 7.271302223205566,
1541
+ "learning_rate": 7.426766226306721e-06,
1542
+ "loss": 0.1935,
1543
+ "step": 2190
1544
+ },
1545
+ {
1546
+ "epoch": 1.263641585295807,
1547
+ "grad_norm": 12.389423370361328,
1548
+ "learning_rate": 7.369327972429638e-06,
1549
+ "loss": 0.3946,
1550
+ "step": 2200
1551
+ },
1552
+ {
1553
+ "epoch": 1.2693854106835152,
1554
+ "grad_norm": 9.09422492980957,
1555
+ "learning_rate": 7.3118897185525564e-06,
1556
+ "loss": 0.3191,
1557
+ "step": 2210
1558
+ },
1559
+ {
1560
+ "epoch": 1.2751292360712234,
1561
+ "grad_norm": 8.75156307220459,
1562
+ "learning_rate": 7.254451464675475e-06,
1563
+ "loss": 0.4077,
1564
+ "step": 2220
1565
+ },
1566
+ {
1567
+ "epoch": 1.2808730614589316,
1568
+ "grad_norm": 7.306863784790039,
1569
+ "learning_rate": 7.197013210798392e-06,
1570
+ "loss": 0.3295,
1571
+ "step": 2230
1572
+ },
1573
+ {
1574
+ "epoch": 1.2866168868466399,
1575
+ "grad_norm": 9.715473175048828,
1576
+ "learning_rate": 7.1395749569213105e-06,
1577
+ "loss": 0.4163,
1578
+ "step": 2240
1579
+ },
1580
+ {
1581
+ "epoch": 1.2923607122343481,
1582
+ "grad_norm": 6.315252780914307,
1583
+ "learning_rate": 7.082136703044228e-06,
1584
+ "loss": 0.3817,
1585
+ "step": 2250
1586
+ },
1587
+ {
1588
+ "epoch": 1.2981045376220564,
1589
+ "grad_norm": 8.821859359741211,
1590
+ "learning_rate": 7.0246984491671455e-06,
1591
+ "loss": 0.3265,
1592
+ "step": 2260
1593
+ },
1594
+ {
1595
+ "epoch": 1.3038483630097644,
1596
+ "grad_norm": 6.838233947753906,
1597
+ "learning_rate": 6.967260195290065e-06,
1598
+ "loss": 0.2966,
1599
+ "step": 2270
1600
+ },
1601
+ {
1602
+ "epoch": 1.3095921883974726,
1603
+ "grad_norm": 9.925073623657227,
1604
+ "learning_rate": 6.909821941412982e-06,
1605
+ "loss": 0.4484,
1606
+ "step": 2280
1607
+ },
1608
+ {
1609
+ "epoch": 1.3153360137851808,
1610
+ "grad_norm": 5.026411056518555,
1611
+ "learning_rate": 6.8523836875358996e-06,
1612
+ "loss": 0.4647,
1613
+ "step": 2290
1614
+ },
1615
+ {
1616
+ "epoch": 1.321079839172889,
1617
+ "grad_norm": 4.3956732749938965,
1618
+ "learning_rate": 6.794945433658817e-06,
1619
+ "loss": 0.2968,
1620
+ "step": 2300
1621
+ },
1622
+ {
1623
+ "epoch": 1.3268236645605973,
1624
+ "grad_norm": 6.904971599578857,
1625
+ "learning_rate": 6.7375071797817345e-06,
1626
+ "loss": 0.2888,
1627
+ "step": 2310
1628
+ },
1629
+ {
1630
+ "epoch": 1.3325674899483055,
1631
+ "grad_norm": 3.1684281826019287,
1632
+ "learning_rate": 6.680068925904653e-06,
1633
+ "loss": 0.2975,
1634
+ "step": 2320
1635
+ },
1636
+ {
1637
+ "epoch": 1.3383113153360138,
1638
+ "grad_norm": 7.3333420753479,
1639
+ "learning_rate": 6.622630672027571e-06,
1640
+ "loss": 0.3911,
1641
+ "step": 2330
1642
+ },
1643
+ {
1644
+ "epoch": 1.344055140723722,
1645
+ "grad_norm": 7.822445392608643,
1646
+ "learning_rate": 6.565192418150489e-06,
1647
+ "loss": 0.3199,
1648
+ "step": 2340
1649
+ },
1650
+ {
1651
+ "epoch": 1.3497989661114302,
1652
+ "grad_norm": 9.02872371673584,
1653
+ "learning_rate": 6.507754164273407e-06,
1654
+ "loss": 0.4698,
1655
+ "step": 2350
1656
+ },
1657
+ {
1658
+ "epoch": 1.3555427914991385,
1659
+ "grad_norm": 3.9332520961761475,
1660
+ "learning_rate": 6.450315910396324e-06,
1661
+ "loss": 0.3651,
1662
+ "step": 2360
1663
+ },
1664
+ {
1665
+ "epoch": 1.3612866168868467,
1666
+ "grad_norm": 7.590347766876221,
1667
+ "learning_rate": 6.392877656519242e-06,
1668
+ "loss": 0.3121,
1669
+ "step": 2370
1670
+ },
1671
+ {
1672
+ "epoch": 1.367030442274555,
1673
+ "grad_norm": 8.964584350585938,
1674
+ "learning_rate": 6.335439402642161e-06,
1675
+ "loss": 0.271,
1676
+ "step": 2380
1677
+ },
1678
+ {
1679
+ "epoch": 1.3727742676622632,
1680
+ "grad_norm": 8.058918952941895,
1681
+ "learning_rate": 6.2780011487650785e-06,
1682
+ "loss": 0.3906,
1683
+ "step": 2390
1684
+ },
1685
+ {
1686
+ "epoch": 1.3785180930499714,
1687
+ "grad_norm": 6.742099761962891,
1688
+ "learning_rate": 6.220562894887996e-06,
1689
+ "loss": 0.3072,
1690
+ "step": 2400
1691
+ },
1692
+ {
1693
+ "epoch": 1.3842619184376794,
1694
+ "grad_norm": 5.961569309234619,
1695
+ "learning_rate": 6.1631246410109134e-06,
1696
+ "loss": 0.3673,
1697
+ "step": 2410
1698
+ },
1699
+ {
1700
+ "epoch": 1.3900057438253877,
1701
+ "grad_norm": 9.705893516540527,
1702
+ "learning_rate": 6.105686387133831e-06,
1703
+ "loss": 0.3544,
1704
+ "step": 2420
1705
+ },
1706
+ {
1707
+ "epoch": 1.395749569213096,
1708
+ "grad_norm": 4.435375690460205,
1709
+ "learning_rate": 6.04824813325675e-06,
1710
+ "loss": 0.2372,
1711
+ "step": 2430
1712
+ },
1713
+ {
1714
+ "epoch": 1.4014933946008041,
1715
+ "grad_norm": 5.375720977783203,
1716
+ "learning_rate": 5.9908098793796675e-06,
1717
+ "loss": 0.2264,
1718
+ "step": 2440
1719
+ },
1720
+ {
1721
+ "epoch": 1.4072372199885124,
1722
+ "grad_norm": 5.602358818054199,
1723
+ "learning_rate": 5.933371625502585e-06,
1724
+ "loss": 0.3449,
1725
+ "step": 2450
1726
+ },
1727
+ {
1728
+ "epoch": 1.4129810453762206,
1729
+ "grad_norm": 10.811373710632324,
1730
+ "learning_rate": 5.875933371625503e-06,
1731
+ "loss": 0.3663,
1732
+ "step": 2460
1733
+ },
1734
+ {
1735
+ "epoch": 1.4187248707639288,
1736
+ "grad_norm": 10.196518898010254,
1737
+ "learning_rate": 5.818495117748421e-06,
1738
+ "loss": 0.3546,
1739
+ "step": 2470
1740
+ },
1741
+ {
1742
+ "epoch": 1.424468696151637,
1743
+ "grad_norm": 10.06306266784668,
1744
+ "learning_rate": 5.761056863871339e-06,
1745
+ "loss": 0.3282,
1746
+ "step": 2480
1747
+ },
1748
+ {
1749
+ "epoch": 1.430212521539345,
1750
+ "grad_norm": 4.978325843811035,
1751
+ "learning_rate": 5.703618609994257e-06,
1752
+ "loss": 0.2961,
1753
+ "step": 2490
1754
+ },
1755
+ {
1756
+ "epoch": 1.4359563469270533,
1757
+ "grad_norm": 10.731146812438965,
1758
+ "learning_rate": 5.646180356117175e-06,
1759
+ "loss": 0.3349,
1760
+ "step": 2500
1761
+ },
1762
+ {
1763
+ "epoch": 1.4417001723147616,
1764
+ "grad_norm": 8.913891792297363,
1765
+ "learning_rate": 5.588742102240092e-06,
1766
+ "loss": 0.3131,
1767
+ "step": 2510
1768
+ },
1769
+ {
1770
+ "epoch": 1.4474439977024698,
1771
+ "grad_norm": 5.1745195388793945,
1772
+ "learning_rate": 5.53130384836301e-06,
1773
+ "loss": 0.3842,
1774
+ "step": 2520
1775
+ },
1776
+ {
1777
+ "epoch": 1.453187823090178,
1778
+ "grad_norm": 8.361491203308105,
1779
+ "learning_rate": 5.473865594485927e-06,
1780
+ "loss": 0.3598,
1781
+ "step": 2530
1782
+ },
1783
+ {
1784
+ "epoch": 1.4589316484778863,
1785
+ "grad_norm": 6.487078666687012,
1786
+ "learning_rate": 5.4164273406088464e-06,
1787
+ "loss": 0.3244,
1788
+ "step": 2540
1789
+ },
1790
+ {
1791
+ "epoch": 1.4646754738655945,
1792
+ "grad_norm": 4.129726409912109,
1793
+ "learning_rate": 5.358989086731764e-06,
1794
+ "loss": 0.3152,
1795
+ "step": 2550
1796
+ },
1797
+ {
1798
+ "epoch": 1.4704192992533027,
1799
+ "grad_norm": 9.363592147827148,
1800
+ "learning_rate": 5.301550832854681e-06,
1801
+ "loss": 0.3321,
1802
+ "step": 2560
1803
+ },
1804
+ {
1805
+ "epoch": 1.476163124641011,
1806
+ "grad_norm": 6.334773063659668,
1807
+ "learning_rate": 5.2441125789776e-06,
1808
+ "loss": 0.3312,
1809
+ "step": 2570
1810
+ },
1811
+ {
1812
+ "epoch": 1.4819069500287192,
1813
+ "grad_norm": 7.404930114746094,
1814
+ "learning_rate": 5.186674325100517e-06,
1815
+ "loss": 0.3739,
1816
+ "step": 2580
1817
+ },
1818
+ {
1819
+ "epoch": 1.4876507754164274,
1820
+ "grad_norm": 7.487016201019287,
1821
+ "learning_rate": 5.1292360712234355e-06,
1822
+ "loss": 0.2865,
1823
+ "step": 2590
1824
+ },
1825
+ {
1826
+ "epoch": 1.4933946008041357,
1827
+ "grad_norm": 13.322307586669922,
1828
+ "learning_rate": 5.071797817346353e-06,
1829
+ "loss": 0.3444,
1830
+ "step": 2600
1831
+ },
1832
+ {
1833
+ "epoch": 1.499138426191844,
1834
+ "grad_norm": 9.053878784179688,
1835
+ "learning_rate": 5.014359563469271e-06,
1836
+ "loss": 0.3799,
1837
+ "step": 2610
1838
+ },
1839
+ {
1840
+ "epoch": 1.5048822515795521,
1841
+ "grad_norm": 4.018943786621094,
1842
+ "learning_rate": 4.956921309592189e-06,
1843
+ "loss": 0.2567,
1844
+ "step": 2620
1845
+ },
1846
+ {
1847
+ "epoch": 1.5106260769672601,
1848
+ "grad_norm": 2.2457354068756104,
1849
+ "learning_rate": 4.899483055715107e-06,
1850
+ "loss": 0.2978,
1851
+ "step": 2630
1852
+ },
1853
+ {
1854
+ "epoch": 1.5163699023549684,
1855
+ "grad_norm": 4.894889831542969,
1856
+ "learning_rate": 4.8420448018380245e-06,
1857
+ "loss": 0.2861,
1858
+ "step": 2640
1859
+ },
1860
+ {
1861
+ "epoch": 1.5221137277426766,
1862
+ "grad_norm": 6.843629360198975,
1863
+ "learning_rate": 4.784606547960942e-06,
1864
+ "loss": 0.3254,
1865
+ "step": 2650
1866
+ },
1867
+ {
1868
+ "epoch": 1.5278575531303848,
1869
+ "grad_norm": 7.173573970794678,
1870
+ "learning_rate": 4.72716829408386e-06,
1871
+ "loss": 0.3481,
1872
+ "step": 2660
1873
+ },
1874
+ {
1875
+ "epoch": 1.533601378518093,
1876
+ "grad_norm": 12.328543663024902,
1877
+ "learning_rate": 4.669730040206778e-06,
1878
+ "loss": 0.3124,
1879
+ "step": 2670
1880
+ },
1881
+ {
1882
+ "epoch": 1.5393452039058013,
1883
+ "grad_norm": 9.337592124938965,
1884
+ "learning_rate": 4.612291786329696e-06,
1885
+ "loss": 0.3005,
1886
+ "step": 2680
1887
+ },
1888
+ {
1889
+ "epoch": 1.5450890292935093,
1890
+ "grad_norm": 5.477969646453857,
1891
+ "learning_rate": 4.5548535324526135e-06,
1892
+ "loss": 0.3457,
1893
+ "step": 2690
1894
+ },
1895
+ {
1896
+ "epoch": 1.5508328546812176,
1897
+ "grad_norm": 5.083920955657959,
1898
+ "learning_rate": 4.497415278575532e-06,
1899
+ "loss": 0.2858,
1900
+ "step": 2700
1901
+ },
1902
+ {
1903
+ "epoch": 1.5565766800689258,
1904
+ "grad_norm": 6.250855445861816,
1905
+ "learning_rate": 4.439977024698449e-06,
1906
+ "loss": 0.2673,
1907
+ "step": 2710
1908
+ },
1909
+ {
1910
+ "epoch": 1.562320505456634,
1911
+ "grad_norm": 6.169952392578125,
1912
+ "learning_rate": 4.382538770821368e-06,
1913
+ "loss": 0.2971,
1914
+ "step": 2720
1915
+ },
1916
+ {
1917
+ "epoch": 1.5680643308443423,
1918
+ "grad_norm": 8.261754989624023,
1919
+ "learning_rate": 4.325100516944285e-06,
1920
+ "loss": 0.3009,
1921
+ "step": 2730
1922
+ },
1923
+ {
1924
+ "epoch": 1.5738081562320505,
1925
+ "grad_norm": 8.477384567260742,
1926
+ "learning_rate": 4.267662263067203e-06,
1927
+ "loss": 0.3017,
1928
+ "step": 2740
1929
+ },
1930
+ {
1931
+ "epoch": 1.5795519816197587,
1932
+ "grad_norm": 8.52374267578125,
1933
+ "learning_rate": 4.210224009190121e-06,
1934
+ "loss": 0.4511,
1935
+ "step": 2750
1936
+ },
1937
+ {
1938
+ "epoch": 1.585295807007467,
1939
+ "grad_norm": 8.646858215332031,
1940
+ "learning_rate": 4.152785755313039e-06,
1941
+ "loss": 0.2699,
1942
+ "step": 2760
1943
+ },
1944
+ {
1945
+ "epoch": 1.5910396323951752,
1946
+ "grad_norm": 7.500174522399902,
1947
+ "learning_rate": 4.095347501435957e-06,
1948
+ "loss": 0.2743,
1949
+ "step": 2770
1950
+ },
1951
+ {
1952
+ "epoch": 1.5967834577828834,
1953
+ "grad_norm": 5.454465389251709,
1954
+ "learning_rate": 4.037909247558874e-06,
1955
+ "loss": 0.3023,
1956
+ "step": 2780
1957
+ },
1958
+ {
1959
+ "epoch": 1.6025272831705917,
1960
+ "grad_norm": 2.6998019218444824,
1961
+ "learning_rate": 3.9804709936817925e-06,
1962
+ "loss": 0.3941,
1963
+ "step": 2790
1964
+ },
1965
+ {
1966
+ "epoch": 1.6082711085583,
1967
+ "grad_norm": 4.594570159912109,
1968
+ "learning_rate": 3.92303273980471e-06,
1969
+ "loss": 0.252,
1970
+ "step": 2800
1971
+ },
1972
+ {
1973
+ "epoch": 1.6140149339460081,
1974
+ "grad_norm": 5.87538480758667,
1975
+ "learning_rate": 3.865594485927628e-06,
1976
+ "loss": 0.3093,
1977
+ "step": 2810
1978
+ },
1979
+ {
1980
+ "epoch": 1.6197587593337164,
1981
+ "grad_norm": 5.358250617980957,
1982
+ "learning_rate": 3.808156232050546e-06,
1983
+ "loss": 0.3674,
1984
+ "step": 2820
1985
+ },
1986
+ {
1987
+ "epoch": 1.6255025847214246,
1988
+ "grad_norm": 4.871222972869873,
1989
+ "learning_rate": 3.7507179781734636e-06,
1990
+ "loss": 0.2407,
1991
+ "step": 2830
1992
+ },
1993
+ {
1994
+ "epoch": 1.6312464101091326,
1995
+ "grad_norm": 6.6836838722229,
1996
+ "learning_rate": 3.693279724296382e-06,
1997
+ "loss": 0.2978,
1998
+ "step": 2840
1999
+ },
2000
+ {
2001
+ "epoch": 1.6369902354968409,
2002
+ "grad_norm": 7.67780065536499,
2003
+ "learning_rate": 3.6358414704192994e-06,
2004
+ "loss": 0.3655,
2005
+ "step": 2850
2006
+ },
2007
+ {
2008
+ "epoch": 1.642734060884549,
2009
+ "grad_norm": 6.890137672424316,
2010
+ "learning_rate": 3.5784032165422173e-06,
2011
+ "loss": 0.3618,
2012
+ "step": 2860
2013
+ },
2014
+ {
2015
+ "epoch": 1.6484778862722573,
2016
+ "grad_norm": 7.769665241241455,
2017
+ "learning_rate": 3.5209649626651356e-06,
2018
+ "loss": 0.2703,
2019
+ "step": 2870
2020
+ },
2021
+ {
2022
+ "epoch": 1.6542217116599656,
2023
+ "grad_norm": 4.888299465179443,
2024
+ "learning_rate": 3.463526708788053e-06,
2025
+ "loss": 0.3257,
2026
+ "step": 2880
2027
+ },
2028
+ {
2029
+ "epoch": 1.6599655370476738,
2030
+ "grad_norm": 4.726266384124756,
2031
+ "learning_rate": 3.406088454910971e-06,
2032
+ "loss": 0.3573,
2033
+ "step": 2890
2034
+ },
2035
+ {
2036
+ "epoch": 1.6657093624353818,
2037
+ "grad_norm": 6.836297035217285,
2038
+ "learning_rate": 3.348650201033889e-06,
2039
+ "loss": 0.3325,
2040
+ "step": 2900
2041
+ },
2042
+ {
2043
+ "epoch": 1.67145318782309,
2044
+ "grad_norm": 6.571508884429932,
2045
+ "learning_rate": 3.2912119471568067e-06,
2046
+ "loss": 0.292,
2047
+ "step": 2910
2048
+ },
2049
+ {
2050
+ "epoch": 1.6771970132107983,
2051
+ "grad_norm": 9.843769073486328,
2052
+ "learning_rate": 3.2337736932797246e-06,
2053
+ "loss": 0.4027,
2054
+ "step": 2920
2055
+ },
2056
+ {
2057
+ "epoch": 1.6829408385985065,
2058
+ "grad_norm": 6.089911937713623,
2059
+ "learning_rate": 3.1763354394026425e-06,
2060
+ "loss": 0.3627,
2061
+ "step": 2930
2062
+ },
2063
+ {
2064
+ "epoch": 1.6886846639862148,
2065
+ "grad_norm": 6.846927165985107,
2066
+ "learning_rate": 3.11889718552556e-06,
2067
+ "loss": 0.3541,
2068
+ "step": 2940
2069
+ },
2070
+ {
2071
+ "epoch": 1.694428489373923,
2072
+ "grad_norm": 10.23181438446045,
2073
+ "learning_rate": 3.0614589316484783e-06,
2074
+ "loss": 0.3418,
2075
+ "step": 2950
2076
+ },
2077
+ {
2078
+ "epoch": 1.7001723147616312,
2079
+ "grad_norm": 5.403523921966553,
2080
+ "learning_rate": 3.0040206777713958e-06,
2081
+ "loss": 0.3498,
2082
+ "step": 2960
2083
+ },
2084
+ {
2085
+ "epoch": 1.7059161401493395,
2086
+ "grad_norm": 8.252917289733887,
2087
+ "learning_rate": 2.9465824238943137e-06,
2088
+ "loss": 0.3401,
2089
+ "step": 2970
2090
+ },
2091
+ {
2092
+ "epoch": 1.7116599655370477,
2093
+ "grad_norm": 7.06523323059082,
2094
+ "learning_rate": 2.889144170017232e-06,
2095
+ "loss": 0.2331,
2096
+ "step": 2980
2097
+ },
2098
+ {
2099
+ "epoch": 1.717403790924756,
2100
+ "grad_norm": 7.739984035491943,
2101
+ "learning_rate": 2.8317059161401494e-06,
2102
+ "loss": 0.3678,
2103
+ "step": 2990
2104
+ },
2105
+ {
2106
+ "epoch": 1.7231476163124642,
2107
+ "grad_norm": 4.021157741546631,
2108
+ "learning_rate": 2.7742676622630677e-06,
2109
+ "loss": 0.3537,
2110
+ "step": 3000
2111
+ }
2112
+ ],
2113
+ "logging_steps": 10,
2114
+ "max_steps": 3482,
2115
+ "num_input_tokens_seen": 0,
2116
+ "num_train_epochs": 2,
2117
+ "save_steps": 500,
2118
+ "stateful_callbacks": {
2119
+ "TrainerControl": {
2120
+ "args": {
2121
+ "should_epoch_stop": false,
2122
+ "should_evaluate": false,
2123
+ "should_log": false,
2124
+ "should_save": true,
2125
+ "should_training_stop": false
2126
+ },
2127
+ "attributes": {}
2128
+ }
2129
+ },
2130
+ "total_flos": 6356845526704128.0,
2131
+ "train_batch_size": 16,
2132
+ "trial_name": null,
2133
+ "trial_params": null
2134
+ }
checkpoint-3000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e7cb6431ba79d66c2c9c534739c31af31901b4f69b86ef7cd6fa47d58097e4a
3
+ size 5368
checkpoint-3000/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3482/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.3,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "problem_type": "single_label_classification",
18
+ "qa_dropout": 0.1,
19
+ "seq_classif_dropout": 0.2,
20
+ "sinusoidal_pos_embds": false,
21
+ "tie_weights_": true,
22
+ "transformers_version": "4.57.1",
23
+ "vocab_size": 30522
24
+ }
checkpoint-3482/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44a9699f5eae1c12be68d7d6783b3db0d0136ec5a126bd85d7435845fcefef10
3
+ size 267832560
checkpoint-3482/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b99703846da685dc1ba9b58eb1ffbaa4ed9817271ddc7bd75053cfd6a8a0983b
3
+ size 535727290
checkpoint-3482/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1af46080ed57ad2b99923ba401a4515d7b1bc2bc42a980f6723e8b5616b10fa
3
+ size 14244
checkpoint-3482/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ed4587eaad0c9722ca4447ada6e6b52ba5a765ac9839bc3b828962bebeed812
3
+ size 1064
checkpoint-3482/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-3482/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3482/tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
checkpoint-3482/trainer_state.json ADDED
@@ -0,0 +1,2470 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.0,
6
+ "eval_steps": 100,
7
+ "global_step": 3482,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.005743825387708214,
14
+ "grad_norm": 1.664604902267456,
15
+ "learning_rate": 1.9948305571510626e-05,
16
+ "loss": 0.6766,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.011487650775416428,
21
+ "grad_norm": 1.8649318218231201,
22
+ "learning_rate": 1.9890867317633546e-05,
23
+ "loss": 0.6443,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.01723147616312464,
28
+ "grad_norm": 2.0676939487457275,
29
+ "learning_rate": 1.9833429063756463e-05,
30
+ "loss": 0.6516,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.022975301550832855,
35
+ "grad_norm": 2.077414035797119,
36
+ "learning_rate": 1.9775990809879383e-05,
37
+ "loss": 0.5468,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.02871912693854107,
42
+ "grad_norm": 2.4310338497161865,
43
+ "learning_rate": 1.97185525560023e-05,
44
+ "loss": 0.6289,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.03446295232624928,
49
+ "grad_norm": 2.605480670928955,
50
+ "learning_rate": 1.9661114302125216e-05,
51
+ "loss": 0.583,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.040206777713957496,
56
+ "grad_norm": 2.7097268104553223,
57
+ "learning_rate": 1.9603676048248136e-05,
58
+ "loss": 0.6029,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.04595060310166571,
63
+ "grad_norm": 4.125054359436035,
64
+ "learning_rate": 1.9546237794371053e-05,
65
+ "loss": 0.5545,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.051694428489373924,
70
+ "grad_norm": 4.592983245849609,
71
+ "learning_rate": 1.948879954049397e-05,
72
+ "loss": 0.5027,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.05743825387708214,
77
+ "grad_norm": 4.799990653991699,
78
+ "learning_rate": 1.9431361286616886e-05,
79
+ "loss": 0.4913,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.06318207926479034,
84
+ "grad_norm": 5.146159648895264,
85
+ "learning_rate": 1.9373923032739806e-05,
86
+ "loss": 0.5419,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.06892590465249857,
91
+ "grad_norm": 6.291711807250977,
92
+ "learning_rate": 1.9316484778862726e-05,
93
+ "loss": 0.478,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.07466973004020677,
98
+ "grad_norm": 4.076691627502441,
99
+ "learning_rate": 1.9259046524985643e-05,
100
+ "loss": 0.489,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.08041355542791499,
105
+ "grad_norm": 5.057150363922119,
106
+ "learning_rate": 1.920160827110856e-05,
107
+ "loss": 0.5435,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.0861573808156232,
112
+ "grad_norm": 3.5335211753845215,
113
+ "learning_rate": 1.914417001723148e-05,
114
+ "loss": 0.4445,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.09190120620333142,
119
+ "grad_norm": 6.209961414337158,
120
+ "learning_rate": 1.9086731763354396e-05,
121
+ "loss": 0.4516,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.09764503159103963,
126
+ "grad_norm": 7.1031341552734375,
127
+ "learning_rate": 1.9029293509477313e-05,
128
+ "loss": 0.576,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.10338885697874785,
133
+ "grad_norm": 4.959975719451904,
134
+ "learning_rate": 1.8971855255600233e-05,
135
+ "loss": 0.4841,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.10913268236645605,
140
+ "grad_norm": 4.904316425323486,
141
+ "learning_rate": 1.891441700172315e-05,
142
+ "loss": 0.5007,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.11487650775416428,
147
+ "grad_norm": 8.833961486816406,
148
+ "learning_rate": 1.8856978747846066e-05,
149
+ "loss": 0.5416,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.12062033314187248,
154
+ "grad_norm": 6.040881156921387,
155
+ "learning_rate": 1.8799540493968982e-05,
156
+ "loss": 0.5162,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.1263641585295807,
161
+ "grad_norm": 7.167252063751221,
162
+ "learning_rate": 1.8742102240091902e-05,
163
+ "loss": 0.4965,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.13210798391728892,
168
+ "grad_norm": 3.9503183364868164,
169
+ "learning_rate": 1.8684663986214822e-05,
170
+ "loss": 0.5129,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.13785180930499713,
175
+ "grad_norm": 4.403937339782715,
176
+ "learning_rate": 1.862722573233774e-05,
177
+ "loss": 0.5192,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.14359563469270534,
182
+ "grad_norm": 6.813930034637451,
183
+ "learning_rate": 1.8569787478460656e-05,
184
+ "loss": 0.4927,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.14933946008041354,
189
+ "grad_norm": 4.355352878570557,
190
+ "learning_rate": 1.8512349224583576e-05,
191
+ "loss": 0.4857,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.15508328546812178,
196
+ "grad_norm": 4.012150287628174,
197
+ "learning_rate": 1.8454910970706492e-05,
198
+ "loss": 0.3763,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.16082711085582999,
203
+ "grad_norm": 7.179994106292725,
204
+ "learning_rate": 1.839747271682941e-05,
205
+ "loss": 0.4625,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.1665709362435382,
210
+ "grad_norm": 3.701215982437134,
211
+ "learning_rate": 1.834003446295233e-05,
212
+ "loss": 0.4954,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.1723147616312464,
217
+ "grad_norm": 5.150369167327881,
218
+ "learning_rate": 1.8282596209075246e-05,
219
+ "loss": 0.4875,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.17805858701895463,
224
+ "grad_norm": 4.432060241699219,
225
+ "learning_rate": 1.8225157955198162e-05,
226
+ "loss": 0.5032,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.18380241240666284,
231
+ "grad_norm": 4.560088634490967,
232
+ "learning_rate": 1.816771970132108e-05,
233
+ "loss": 0.4443,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.18954623779437105,
238
+ "grad_norm": 6.000060558319092,
239
+ "learning_rate": 1.8110281447444e-05,
240
+ "loss": 0.4805,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.19529006318207925,
245
+ "grad_norm": 6.462589740753174,
246
+ "learning_rate": 1.805284319356692e-05,
247
+ "loss": 0.4009,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.2010338885697875,
252
+ "grad_norm": 4.6327104568481445,
253
+ "learning_rate": 1.7995404939689835e-05,
254
+ "loss": 0.5126,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.2067777139574957,
259
+ "grad_norm": 5.640307426452637,
260
+ "learning_rate": 1.7937966685812752e-05,
261
+ "loss": 0.4308,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.2125215393452039,
266
+ "grad_norm": 6.553674697875977,
267
+ "learning_rate": 1.7880528431935672e-05,
268
+ "loss": 0.4686,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.2182653647329121,
273
+ "grad_norm": 6.482476234436035,
274
+ "learning_rate": 1.782309017805859e-05,
275
+ "loss": 0.4214,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.22400919012062034,
280
+ "grad_norm": 6.897400379180908,
281
+ "learning_rate": 1.7765651924181505e-05,
282
+ "loss": 0.4827,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.22975301550832855,
287
+ "grad_norm": 3.935074806213379,
288
+ "learning_rate": 1.7708213670304425e-05,
289
+ "loss": 0.4212,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.23549684089603676,
294
+ "grad_norm": 5.766780853271484,
295
+ "learning_rate": 1.7650775416427342e-05,
296
+ "loss": 0.4421,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.24124066628374496,
301
+ "grad_norm": 8.834909439086914,
302
+ "learning_rate": 1.759333716255026e-05,
303
+ "loss": 0.4676,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.2469844916714532,
308
+ "grad_norm": 4.853402614593506,
309
+ "learning_rate": 1.7535898908673175e-05,
310
+ "loss": 0.4888,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.2527283170591614,
315
+ "grad_norm": 5.489238739013672,
316
+ "learning_rate": 1.7478460654796095e-05,
317
+ "loss": 0.3763,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.2584721424468696,
322
+ "grad_norm": 3.805001974105835,
323
+ "learning_rate": 1.7421022400919015e-05,
324
+ "loss": 0.5266,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.26421596783457785,
329
+ "grad_norm": 5.9350080490112305,
330
+ "learning_rate": 1.7363584147041932e-05,
331
+ "loss": 0.4053,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.269959793222286,
336
+ "grad_norm": 8.31826400756836,
337
+ "learning_rate": 1.730614589316485e-05,
338
+ "loss": 0.4775,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.27570361860999426,
343
+ "grad_norm": 5.709866046905518,
344
+ "learning_rate": 1.724870763928777e-05,
345
+ "loss": 0.3681,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.2814474439977025,
350
+ "grad_norm": 2.6657683849334717,
351
+ "learning_rate": 1.7191269385410685e-05,
352
+ "loss": 0.3913,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.2871912693854107,
357
+ "grad_norm": 8.942373275756836,
358
+ "learning_rate": 1.71338311315336e-05,
359
+ "loss": 0.4454,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.2929350947731189,
364
+ "grad_norm": 7.566854476928711,
365
+ "learning_rate": 1.707639287765652e-05,
366
+ "loss": 0.4704,
367
+ "step": 510
368
+ },
369
+ {
370
+ "epoch": 0.2986789201608271,
371
+ "grad_norm": 5.819560527801514,
372
+ "learning_rate": 1.7018954623779438e-05,
373
+ "loss": 0.4245,
374
+ "step": 520
375
+ },
376
+ {
377
+ "epoch": 0.3044227455485353,
378
+ "grad_norm": 7.269554138183594,
379
+ "learning_rate": 1.6961516369902355e-05,
380
+ "loss": 0.523,
381
+ "step": 530
382
+ },
383
+ {
384
+ "epoch": 0.31016657093624356,
385
+ "grad_norm": 4.569669246673584,
386
+ "learning_rate": 1.690407811602527e-05,
387
+ "loss": 0.5053,
388
+ "step": 540
389
+ },
390
+ {
391
+ "epoch": 0.31591039632395174,
392
+ "grad_norm": 6.8373332023620605,
393
+ "learning_rate": 1.684663986214819e-05,
394
+ "loss": 0.4592,
395
+ "step": 550
396
+ },
397
+ {
398
+ "epoch": 0.32165422171165997,
399
+ "grad_norm": 7.736797332763672,
400
+ "learning_rate": 1.678920160827111e-05,
401
+ "loss": 0.4756,
402
+ "step": 560
403
+ },
404
+ {
405
+ "epoch": 0.3273980470993682,
406
+ "grad_norm": 5.857370853424072,
407
+ "learning_rate": 1.6731763354394028e-05,
408
+ "loss": 0.4906,
409
+ "step": 570
410
+ },
411
+ {
412
+ "epoch": 0.3331418724870764,
413
+ "grad_norm": 4.2734785079956055,
414
+ "learning_rate": 1.6674325100516945e-05,
415
+ "loss": 0.4623,
416
+ "step": 580
417
+ },
418
+ {
419
+ "epoch": 0.3388856978747846,
420
+ "grad_norm": 7.859753131866455,
421
+ "learning_rate": 1.6616886846639865e-05,
422
+ "loss": 0.4568,
423
+ "step": 590
424
+ },
425
+ {
426
+ "epoch": 0.3446295232624928,
427
+ "grad_norm": 4.936352252960205,
428
+ "learning_rate": 1.655944859276278e-05,
429
+ "loss": 0.3774,
430
+ "step": 600
431
+ },
432
+ {
433
+ "epoch": 0.35037334865020103,
434
+ "grad_norm": 7.164617538452148,
435
+ "learning_rate": 1.6502010338885698e-05,
436
+ "loss": 0.5373,
437
+ "step": 610
438
+ },
439
+ {
440
+ "epoch": 0.35611717403790927,
441
+ "grad_norm": 7.279328346252441,
442
+ "learning_rate": 1.6444572085008618e-05,
443
+ "loss": 0.4183,
444
+ "step": 620
445
+ },
446
+ {
447
+ "epoch": 0.36186099942561745,
448
+ "grad_norm": 8.805562973022461,
449
+ "learning_rate": 1.6387133831131535e-05,
450
+ "loss": 0.4232,
451
+ "step": 630
452
+ },
453
+ {
454
+ "epoch": 0.3676048248133257,
455
+ "grad_norm": 6.055676460266113,
456
+ "learning_rate": 1.632969557725445e-05,
457
+ "loss": 0.4316,
458
+ "step": 640
459
+ },
460
+ {
461
+ "epoch": 0.3733486502010339,
462
+ "grad_norm": 5.835373401641846,
463
+ "learning_rate": 1.627225732337737e-05,
464
+ "loss": 0.4573,
465
+ "step": 650
466
+ },
467
+ {
468
+ "epoch": 0.3790924755887421,
469
+ "grad_norm": 2.7914931774139404,
470
+ "learning_rate": 1.6214819069500288e-05,
471
+ "loss": 0.3691,
472
+ "step": 660
473
+ },
474
+ {
475
+ "epoch": 0.38483630097645033,
476
+ "grad_norm": 5.780324459075928,
477
+ "learning_rate": 1.6157380815623208e-05,
478
+ "loss": 0.3761,
479
+ "step": 670
480
+ },
481
+ {
482
+ "epoch": 0.3905801263641585,
483
+ "grad_norm": 5.7374348640441895,
484
+ "learning_rate": 1.6099942561746125e-05,
485
+ "loss": 0.4276,
486
+ "step": 680
487
+ },
488
+ {
489
+ "epoch": 0.39632395175186674,
490
+ "grad_norm": 4.771572589874268,
491
+ "learning_rate": 1.604250430786904e-05,
492
+ "loss": 0.4092,
493
+ "step": 690
494
+ },
495
+ {
496
+ "epoch": 0.402067777139575,
497
+ "grad_norm": 7.6168928146362305,
498
+ "learning_rate": 1.598506605399196e-05,
499
+ "loss": 0.4651,
500
+ "step": 700
501
+ },
502
+ {
503
+ "epoch": 0.40781160252728316,
504
+ "grad_norm": 6.272294998168945,
505
+ "learning_rate": 1.5927627800114878e-05,
506
+ "loss": 0.4107,
507
+ "step": 710
508
+ },
509
+ {
510
+ "epoch": 0.4135554279149914,
511
+ "grad_norm": 8.060046195983887,
512
+ "learning_rate": 1.5870189546237794e-05,
513
+ "loss": 0.4117,
514
+ "step": 720
515
+ },
516
+ {
517
+ "epoch": 0.41929925330269957,
518
+ "grad_norm": 7.290645122528076,
519
+ "learning_rate": 1.5812751292360714e-05,
520
+ "loss": 0.5005,
521
+ "step": 730
522
+ },
523
+ {
524
+ "epoch": 0.4250430786904078,
525
+ "grad_norm": 8.02639102935791,
526
+ "learning_rate": 1.575531303848363e-05,
527
+ "loss": 0.4561,
528
+ "step": 740
529
+ },
530
+ {
531
+ "epoch": 0.43078690407811604,
532
+ "grad_norm": 8.455704689025879,
533
+ "learning_rate": 1.5697874784606548e-05,
534
+ "loss": 0.4592,
535
+ "step": 750
536
+ },
537
+ {
538
+ "epoch": 0.4365307294658242,
539
+ "grad_norm": 2.9264323711395264,
540
+ "learning_rate": 1.5640436530729468e-05,
541
+ "loss": 0.4532,
542
+ "step": 760
543
+ },
544
+ {
545
+ "epoch": 0.44227455485353245,
546
+ "grad_norm": 5.246438980102539,
547
+ "learning_rate": 1.5582998276852384e-05,
548
+ "loss": 0.4931,
549
+ "step": 770
550
+ },
551
+ {
552
+ "epoch": 0.4480183802412407,
553
+ "grad_norm": 3.869628667831421,
554
+ "learning_rate": 1.5525560022975304e-05,
555
+ "loss": 0.3476,
556
+ "step": 780
557
+ },
558
+ {
559
+ "epoch": 0.45376220562894887,
560
+ "grad_norm": 4.742978572845459,
561
+ "learning_rate": 1.546812176909822e-05,
562
+ "loss": 0.4274,
563
+ "step": 790
564
+ },
565
+ {
566
+ "epoch": 0.4595060310166571,
567
+ "grad_norm": 6.9326300621032715,
568
+ "learning_rate": 1.5410683515221138e-05,
569
+ "loss": 0.5299,
570
+ "step": 800
571
+ },
572
+ {
573
+ "epoch": 0.4652498564043653,
574
+ "grad_norm": 6.16196346282959,
575
+ "learning_rate": 1.5353245261344058e-05,
576
+ "loss": 0.4805,
577
+ "step": 810
578
+ },
579
+ {
580
+ "epoch": 0.4709936817920735,
581
+ "grad_norm": 3.637753486633301,
582
+ "learning_rate": 1.5295807007466974e-05,
583
+ "loss": 0.4393,
584
+ "step": 820
585
+ },
586
+ {
587
+ "epoch": 0.47673750717978175,
588
+ "grad_norm": 3.867966413497925,
589
+ "learning_rate": 1.523836875358989e-05,
590
+ "loss": 0.4358,
591
+ "step": 830
592
+ },
593
+ {
594
+ "epoch": 0.48248133256748993,
595
+ "grad_norm": 4.753304481506348,
596
+ "learning_rate": 1.5180930499712809e-05,
597
+ "loss": 0.4737,
598
+ "step": 840
599
+ },
600
+ {
601
+ "epoch": 0.48822515795519816,
602
+ "grad_norm": 7.7434515953063965,
603
+ "learning_rate": 1.5123492245835727e-05,
604
+ "loss": 0.4466,
605
+ "step": 850
606
+ },
607
+ {
608
+ "epoch": 0.4939689833429064,
609
+ "grad_norm": 6.001241683959961,
610
+ "learning_rate": 1.5066053991958644e-05,
611
+ "loss": 0.4016,
612
+ "step": 860
613
+ },
614
+ {
615
+ "epoch": 0.4997128087306146,
616
+ "grad_norm": 7.710599422454834,
617
+ "learning_rate": 1.5008615738081564e-05,
618
+ "loss": 0.4758,
619
+ "step": 870
620
+ },
621
+ {
622
+ "epoch": 0.5054566341183228,
623
+ "grad_norm": 5.139091491699219,
624
+ "learning_rate": 1.4951177484204482e-05,
625
+ "loss": 0.4019,
626
+ "step": 880
627
+ },
628
+ {
629
+ "epoch": 0.511200459506031,
630
+ "grad_norm": 9.428766250610352,
631
+ "learning_rate": 1.4893739230327399e-05,
632
+ "loss": 0.3742,
633
+ "step": 890
634
+ },
635
+ {
636
+ "epoch": 0.5169442848937392,
637
+ "grad_norm": 9.071430206298828,
638
+ "learning_rate": 1.4836300976450317e-05,
639
+ "loss": 0.4183,
640
+ "step": 900
641
+ },
642
+ {
643
+ "epoch": 0.5226881102814475,
644
+ "grad_norm": 10.34457015991211,
645
+ "learning_rate": 1.4778862722573236e-05,
646
+ "loss": 0.437,
647
+ "step": 910
648
+ },
649
+ {
650
+ "epoch": 0.5284319356691557,
651
+ "grad_norm": 7.01099967956543,
652
+ "learning_rate": 1.4721424468696152e-05,
653
+ "loss": 0.4421,
654
+ "step": 920
655
+ },
656
+ {
657
+ "epoch": 0.5341757610568638,
658
+ "grad_norm": 3.454484701156616,
659
+ "learning_rate": 1.466398621481907e-05,
660
+ "loss": 0.4283,
661
+ "step": 930
662
+ },
663
+ {
664
+ "epoch": 0.539919586444572,
665
+ "grad_norm": 5.86787223815918,
666
+ "learning_rate": 1.4606547960941987e-05,
667
+ "loss": 0.4094,
668
+ "step": 940
669
+ },
670
+ {
671
+ "epoch": 0.5456634118322803,
672
+ "grad_norm": 8.41640567779541,
673
+ "learning_rate": 1.4549109707064906e-05,
674
+ "loss": 0.414,
675
+ "step": 950
676
+ },
677
+ {
678
+ "epoch": 0.5514072372199885,
679
+ "grad_norm": 6.936724662780762,
680
+ "learning_rate": 1.4491671453187824e-05,
681
+ "loss": 0.3936,
682
+ "step": 960
683
+ },
684
+ {
685
+ "epoch": 0.5571510626076968,
686
+ "grad_norm": 7.984805583953857,
687
+ "learning_rate": 1.4434233199310744e-05,
688
+ "loss": 0.4071,
689
+ "step": 970
690
+ },
691
+ {
692
+ "epoch": 0.562894887995405,
693
+ "grad_norm": 4.381742477416992,
694
+ "learning_rate": 1.437679494543366e-05,
695
+ "loss": 0.3393,
696
+ "step": 980
697
+ },
698
+ {
699
+ "epoch": 0.5686387133831131,
700
+ "grad_norm": 7.115252494812012,
701
+ "learning_rate": 1.4319356691556579e-05,
702
+ "loss": 0.5616,
703
+ "step": 990
704
+ },
705
+ {
706
+ "epoch": 0.5743825387708213,
707
+ "grad_norm": 6.60570764541626,
708
+ "learning_rate": 1.4261918437679495e-05,
709
+ "loss": 0.4633,
710
+ "step": 1000
711
+ },
712
+ {
713
+ "epoch": 0.5801263641585296,
714
+ "grad_norm": 9.552061080932617,
715
+ "learning_rate": 1.4204480183802414e-05,
716
+ "loss": 0.4973,
717
+ "step": 1010
718
+ },
719
+ {
720
+ "epoch": 0.5858701895462378,
721
+ "grad_norm": 3.6467580795288086,
722
+ "learning_rate": 1.4147041929925332e-05,
723
+ "loss": 0.3958,
724
+ "step": 1020
725
+ },
726
+ {
727
+ "epoch": 0.591614014933946,
728
+ "grad_norm": 2.663799524307251,
729
+ "learning_rate": 1.4089603676048249e-05,
730
+ "loss": 0.3402,
731
+ "step": 1030
732
+ },
733
+ {
734
+ "epoch": 0.5973578403216542,
735
+ "grad_norm": 7.468900203704834,
736
+ "learning_rate": 1.4032165422171167e-05,
737
+ "loss": 0.3341,
738
+ "step": 1040
739
+ },
740
+ {
741
+ "epoch": 0.6031016657093624,
742
+ "grad_norm": 10.396268844604492,
743
+ "learning_rate": 1.3974727168294084e-05,
744
+ "loss": 0.5122,
745
+ "step": 1050
746
+ },
747
+ {
748
+ "epoch": 0.6088454910970706,
749
+ "grad_norm": 7.79450798034668,
750
+ "learning_rate": 1.3917288914417002e-05,
751
+ "loss": 0.5581,
752
+ "step": 1060
753
+ },
754
+ {
755
+ "epoch": 0.6145893164847789,
756
+ "grad_norm": 8.077397346496582,
757
+ "learning_rate": 1.385985066053992e-05,
758
+ "loss": 0.4247,
759
+ "step": 1070
760
+ },
761
+ {
762
+ "epoch": 0.6203331418724871,
763
+ "grad_norm": 8.327542304992676,
764
+ "learning_rate": 1.380241240666284e-05,
765
+ "loss": 0.4122,
766
+ "step": 1080
767
+ },
768
+ {
769
+ "epoch": 0.6260769672601952,
770
+ "grad_norm": 7.940774917602539,
771
+ "learning_rate": 1.3744974152785757e-05,
772
+ "loss": 0.5199,
773
+ "step": 1090
774
+ },
775
+ {
776
+ "epoch": 0.6318207926479035,
777
+ "grad_norm": 5.148271560668945,
778
+ "learning_rate": 1.3687535898908675e-05,
779
+ "loss": 0.4534,
780
+ "step": 1100
781
+ },
782
+ {
783
+ "epoch": 0.6375646180356117,
784
+ "grad_norm": 7.042996883392334,
785
+ "learning_rate": 1.3630097645031592e-05,
786
+ "loss": 0.4737,
787
+ "step": 1110
788
+ },
789
+ {
790
+ "epoch": 0.6433084434233199,
791
+ "grad_norm": 5.131284236907959,
792
+ "learning_rate": 1.357265939115451e-05,
793
+ "loss": 0.3637,
794
+ "step": 1120
795
+ },
796
+ {
797
+ "epoch": 0.6490522688110282,
798
+ "grad_norm": 10.73865795135498,
799
+ "learning_rate": 1.3515221137277428e-05,
800
+ "loss": 0.4152,
801
+ "step": 1130
802
+ },
803
+ {
804
+ "epoch": 0.6547960941987364,
805
+ "grad_norm": 6.061016082763672,
806
+ "learning_rate": 1.3457782883400345e-05,
807
+ "loss": 0.343,
808
+ "step": 1140
809
+ },
810
+ {
811
+ "epoch": 0.6605399195864445,
812
+ "grad_norm": 11.200201988220215,
813
+ "learning_rate": 1.3400344629523263e-05,
814
+ "loss": 0.4781,
815
+ "step": 1150
816
+ },
817
+ {
818
+ "epoch": 0.6662837449741528,
819
+ "grad_norm": 6.987539768218994,
820
+ "learning_rate": 1.334290637564618e-05,
821
+ "loss": 0.4046,
822
+ "step": 1160
823
+ },
824
+ {
825
+ "epoch": 0.672027570361861,
826
+ "grad_norm": 7.3787713050842285,
827
+ "learning_rate": 1.3285468121769098e-05,
828
+ "loss": 0.4136,
829
+ "step": 1170
830
+ },
831
+ {
832
+ "epoch": 0.6777713957495692,
833
+ "grad_norm": 8.829427719116211,
834
+ "learning_rate": 1.3228029867892018e-05,
835
+ "loss": 0.3807,
836
+ "step": 1180
837
+ },
838
+ {
839
+ "epoch": 0.6835152211372775,
840
+ "grad_norm": 9.648842811584473,
841
+ "learning_rate": 1.3170591614014937e-05,
842
+ "loss": 0.3273,
843
+ "step": 1190
844
+ },
845
+ {
846
+ "epoch": 0.6892590465249856,
847
+ "grad_norm": 7.307587146759033,
848
+ "learning_rate": 1.3113153360137853e-05,
849
+ "loss": 0.3351,
850
+ "step": 1200
851
+ },
852
+ {
853
+ "epoch": 0.6950028719126938,
854
+ "grad_norm": 6.445584297180176,
855
+ "learning_rate": 1.3055715106260772e-05,
856
+ "loss": 0.4776,
857
+ "step": 1210
858
+ },
859
+ {
860
+ "epoch": 0.7007466973004021,
861
+ "grad_norm": 7.078349590301514,
862
+ "learning_rate": 1.2998276852383688e-05,
863
+ "loss": 0.4438,
864
+ "step": 1220
865
+ },
866
+ {
867
+ "epoch": 0.7064905226881103,
868
+ "grad_norm": 9.63571834564209,
869
+ "learning_rate": 1.2940838598506606e-05,
870
+ "loss": 0.4027,
871
+ "step": 1230
872
+ },
873
+ {
874
+ "epoch": 0.7122343480758185,
875
+ "grad_norm": 3.2861080169677734,
876
+ "learning_rate": 1.2883400344629525e-05,
877
+ "loss": 0.3334,
878
+ "step": 1240
879
+ },
880
+ {
881
+ "epoch": 0.7179781734635267,
882
+ "grad_norm": 7.433917999267578,
883
+ "learning_rate": 1.2825962090752441e-05,
884
+ "loss": 0.4418,
885
+ "step": 1250
886
+ },
887
+ {
888
+ "epoch": 0.7237219988512349,
889
+ "grad_norm": 7.511765003204346,
890
+ "learning_rate": 1.276852383687536e-05,
891
+ "loss": 0.3693,
892
+ "step": 1260
893
+ },
894
+ {
895
+ "epoch": 0.7294658242389431,
896
+ "grad_norm": 9.38160228729248,
897
+ "learning_rate": 1.2711085582998276e-05,
898
+ "loss": 0.3989,
899
+ "step": 1270
900
+ },
901
+ {
902
+ "epoch": 0.7352096496266514,
903
+ "grad_norm": 2.012756586074829,
904
+ "learning_rate": 1.2653647329121195e-05,
905
+ "loss": 0.3867,
906
+ "step": 1280
907
+ },
908
+ {
909
+ "epoch": 0.7409534750143596,
910
+ "grad_norm": 6.777096748352051,
911
+ "learning_rate": 1.2596209075244115e-05,
912
+ "loss": 0.5528,
913
+ "step": 1290
914
+ },
915
+ {
916
+ "epoch": 0.7466973004020678,
917
+ "grad_norm": 6.928145885467529,
918
+ "learning_rate": 1.2538770821367033e-05,
919
+ "loss": 0.3403,
920
+ "step": 1300
921
+ },
922
+ {
923
+ "epoch": 0.752441125789776,
924
+ "grad_norm": 7.99967622756958,
925
+ "learning_rate": 1.248133256748995e-05,
926
+ "loss": 0.5006,
927
+ "step": 1310
928
+ },
929
+ {
930
+ "epoch": 0.7581849511774842,
931
+ "grad_norm": 4.8364033699035645,
932
+ "learning_rate": 1.2423894313612868e-05,
933
+ "loss": 0.4266,
934
+ "step": 1320
935
+ },
936
+ {
937
+ "epoch": 0.7639287765651924,
938
+ "grad_norm": 4.017831802368164,
939
+ "learning_rate": 1.2366456059735785e-05,
940
+ "loss": 0.3444,
941
+ "step": 1330
942
+ },
943
+ {
944
+ "epoch": 0.7696726019529007,
945
+ "grad_norm": 5.27449893951416,
946
+ "learning_rate": 1.2309017805858703e-05,
947
+ "loss": 0.3962,
948
+ "step": 1340
949
+ },
950
+ {
951
+ "epoch": 0.7754164273406089,
952
+ "grad_norm": 7.989853858947754,
953
+ "learning_rate": 1.2251579551981621e-05,
954
+ "loss": 0.4172,
955
+ "step": 1350
956
+ },
957
+ {
958
+ "epoch": 0.781160252728317,
959
+ "grad_norm": 5.878440856933594,
960
+ "learning_rate": 1.2194141298104538e-05,
961
+ "loss": 0.4723,
962
+ "step": 1360
963
+ },
964
+ {
965
+ "epoch": 0.7869040781160253,
966
+ "grad_norm": 9.140411376953125,
967
+ "learning_rate": 1.2136703044227456e-05,
968
+ "loss": 0.4012,
969
+ "step": 1370
970
+ },
971
+ {
972
+ "epoch": 0.7926479035037335,
973
+ "grad_norm": 3.9119012355804443,
974
+ "learning_rate": 1.2079264790350373e-05,
975
+ "loss": 0.3545,
976
+ "step": 1380
977
+ },
978
+ {
979
+ "epoch": 0.7983917288914417,
980
+ "grad_norm": 6.248933792114258,
981
+ "learning_rate": 1.2021826536473291e-05,
982
+ "loss": 0.4787,
983
+ "step": 1390
984
+ },
985
+ {
986
+ "epoch": 0.80413555427915,
987
+ "grad_norm": 8.478959083557129,
988
+ "learning_rate": 1.1964388282596211e-05,
989
+ "loss": 0.4077,
990
+ "step": 1400
991
+ },
992
+ {
993
+ "epoch": 0.8098793796668581,
994
+ "grad_norm": 6.234384059906006,
995
+ "learning_rate": 1.190695002871913e-05,
996
+ "loss": 0.4044,
997
+ "step": 1410
998
+ },
999
+ {
1000
+ "epoch": 0.8156232050545663,
1001
+ "grad_norm": 5.093031883239746,
1002
+ "learning_rate": 1.1849511774842046e-05,
1003
+ "loss": 0.3298,
1004
+ "step": 1420
1005
+ },
1006
+ {
1007
+ "epoch": 0.8213670304422745,
1008
+ "grad_norm": 5.755350112915039,
1009
+ "learning_rate": 1.1792073520964964e-05,
1010
+ "loss": 0.4099,
1011
+ "step": 1430
1012
+ },
1013
+ {
1014
+ "epoch": 0.8271108558299828,
1015
+ "grad_norm": 9.269704818725586,
1016
+ "learning_rate": 1.1734635267087881e-05,
1017
+ "loss": 0.366,
1018
+ "step": 1440
1019
+ },
1020
+ {
1021
+ "epoch": 0.832854681217691,
1022
+ "grad_norm": 4.977533340454102,
1023
+ "learning_rate": 1.16771970132108e-05,
1024
+ "loss": 0.3156,
1025
+ "step": 1450
1026
+ },
1027
+ {
1028
+ "epoch": 0.8385985066053991,
1029
+ "grad_norm": 6.767063140869141,
1030
+ "learning_rate": 1.1619758759333718e-05,
1031
+ "loss": 0.3966,
1032
+ "step": 1460
1033
+ },
1034
+ {
1035
+ "epoch": 0.8443423319931074,
1036
+ "grad_norm": 6.855627536773682,
1037
+ "learning_rate": 1.1562320505456634e-05,
1038
+ "loss": 0.3488,
1039
+ "step": 1470
1040
+ },
1041
+ {
1042
+ "epoch": 0.8500861573808156,
1043
+ "grad_norm": 3.408679723739624,
1044
+ "learning_rate": 1.1504882251579552e-05,
1045
+ "loss": 0.4016,
1046
+ "step": 1480
1047
+ },
1048
+ {
1049
+ "epoch": 0.8558299827685238,
1050
+ "grad_norm": 8.43376636505127,
1051
+ "learning_rate": 1.1447443997702469e-05,
1052
+ "loss": 0.3951,
1053
+ "step": 1490
1054
+ },
1055
+ {
1056
+ "epoch": 0.8615738081562321,
1057
+ "grad_norm": 7.106573104858398,
1058
+ "learning_rate": 1.1390005743825389e-05,
1059
+ "loss": 0.3056,
1060
+ "step": 1500
1061
+ },
1062
+ {
1063
+ "epoch": 0.8673176335439403,
1064
+ "grad_norm": 3.373734474182129,
1065
+ "learning_rate": 1.1332567489948307e-05,
1066
+ "loss": 0.4548,
1067
+ "step": 1510
1068
+ },
1069
+ {
1070
+ "epoch": 0.8730614589316484,
1071
+ "grad_norm": 4.657841205596924,
1072
+ "learning_rate": 1.1275129236071226e-05,
1073
+ "loss": 0.36,
1074
+ "step": 1520
1075
+ },
1076
+ {
1077
+ "epoch": 0.8788052843193567,
1078
+ "grad_norm": 8.218329429626465,
1079
+ "learning_rate": 1.1217690982194142e-05,
1080
+ "loss": 0.4297,
1081
+ "step": 1530
1082
+ },
1083
+ {
1084
+ "epoch": 0.8845491097070649,
1085
+ "grad_norm": 7.709052562713623,
1086
+ "learning_rate": 1.116025272831706e-05,
1087
+ "loss": 0.3766,
1088
+ "step": 1540
1089
+ },
1090
+ {
1091
+ "epoch": 0.8902929350947731,
1092
+ "grad_norm": 6.875143527984619,
1093
+ "learning_rate": 1.1102814474439977e-05,
1094
+ "loss": 0.4106,
1095
+ "step": 1550
1096
+ },
1097
+ {
1098
+ "epoch": 0.8960367604824814,
1099
+ "grad_norm": 5.460892200469971,
1100
+ "learning_rate": 1.1045376220562896e-05,
1101
+ "loss": 0.3016,
1102
+ "step": 1560
1103
+ },
1104
+ {
1105
+ "epoch": 0.9017805858701895,
1106
+ "grad_norm": 3.4830429553985596,
1107
+ "learning_rate": 1.0987937966685814e-05,
1108
+ "loss": 0.371,
1109
+ "step": 1570
1110
+ },
1111
+ {
1112
+ "epoch": 0.9075244112578977,
1113
+ "grad_norm": 8.233579635620117,
1114
+ "learning_rate": 1.093049971280873e-05,
1115
+ "loss": 0.4591,
1116
+ "step": 1580
1117
+ },
1118
+ {
1119
+ "epoch": 0.913268236645606,
1120
+ "grad_norm": 7.001081466674805,
1121
+ "learning_rate": 1.0873061458931649e-05,
1122
+ "loss": 0.3856,
1123
+ "step": 1590
1124
+ },
1125
+ {
1126
+ "epoch": 0.9190120620333142,
1127
+ "grad_norm": 7.473963260650635,
1128
+ "learning_rate": 1.0815623205054565e-05,
1129
+ "loss": 0.4937,
1130
+ "step": 1600
1131
+ },
1132
+ {
1133
+ "epoch": 0.9247558874210224,
1134
+ "grad_norm": 4.046863079071045,
1135
+ "learning_rate": 1.0758184951177485e-05,
1136
+ "loss": 0.4141,
1137
+ "step": 1610
1138
+ },
1139
+ {
1140
+ "epoch": 0.9304997128087306,
1141
+ "grad_norm": 5.425885200500488,
1142
+ "learning_rate": 1.0700746697300404e-05,
1143
+ "loss": 0.3545,
1144
+ "step": 1620
1145
+ },
1146
+ {
1147
+ "epoch": 0.9362435381964388,
1148
+ "grad_norm": 5.255967140197754,
1149
+ "learning_rate": 1.0643308443423322e-05,
1150
+ "loss": 0.3881,
1151
+ "step": 1630
1152
+ },
1153
+ {
1154
+ "epoch": 0.941987363584147,
1155
+ "grad_norm": 7.365703105926514,
1156
+ "learning_rate": 1.0585870189546239e-05,
1157
+ "loss": 0.3648,
1158
+ "step": 1640
1159
+ },
1160
+ {
1161
+ "epoch": 0.9477311889718553,
1162
+ "grad_norm": 5.149658679962158,
1163
+ "learning_rate": 1.0528431935669157e-05,
1164
+ "loss": 0.2952,
1165
+ "step": 1650
1166
+ },
1167
+ {
1168
+ "epoch": 0.9534750143595635,
1169
+ "grad_norm": 5.5968194007873535,
1170
+ "learning_rate": 1.0470993681792074e-05,
1171
+ "loss": 0.3604,
1172
+ "step": 1660
1173
+ },
1174
+ {
1175
+ "epoch": 0.9592188397472717,
1176
+ "grad_norm": 6.176368713378906,
1177
+ "learning_rate": 1.0413555427914992e-05,
1178
+ "loss": 0.3584,
1179
+ "step": 1670
1180
+ },
1181
+ {
1182
+ "epoch": 0.9649626651349799,
1183
+ "grad_norm": 5.876338958740234,
1184
+ "learning_rate": 1.035611717403791e-05,
1185
+ "loss": 0.4076,
1186
+ "step": 1680
1187
+ },
1188
+ {
1189
+ "epoch": 0.9707064905226881,
1190
+ "grad_norm": 11.697908401489258,
1191
+ "learning_rate": 1.0298678920160827e-05,
1192
+ "loss": 0.3648,
1193
+ "step": 1690
1194
+ },
1195
+ {
1196
+ "epoch": 0.9764503159103963,
1197
+ "grad_norm": 7.04163122177124,
1198
+ "learning_rate": 1.0241240666283745e-05,
1199
+ "loss": 0.349,
1200
+ "step": 1700
1201
+ },
1202
+ {
1203
+ "epoch": 0.9821941412981046,
1204
+ "grad_norm": 6.7707133293151855,
1205
+ "learning_rate": 1.0183802412406662e-05,
1206
+ "loss": 0.3888,
1207
+ "step": 1710
1208
+ },
1209
+ {
1210
+ "epoch": 0.9879379666858128,
1211
+ "grad_norm": 3.8108270168304443,
1212
+ "learning_rate": 1.0126364158529582e-05,
1213
+ "loss": 0.3753,
1214
+ "step": 1720
1215
+ },
1216
+ {
1217
+ "epoch": 0.9936817920735209,
1218
+ "grad_norm": 11.013320922851562,
1219
+ "learning_rate": 1.00689259046525e-05,
1220
+ "loss": 0.3977,
1221
+ "step": 1730
1222
+ },
1223
+ {
1224
+ "epoch": 0.9994256174612292,
1225
+ "grad_norm": 4.042791843414307,
1226
+ "learning_rate": 1.0011487650775419e-05,
1227
+ "loss": 0.3724,
1228
+ "step": 1740
1229
+ },
1230
+ {
1231
+ "epoch": 1.0051694428489375,
1232
+ "grad_norm": 6.258309841156006,
1233
+ "learning_rate": 9.954049396898335e-06,
1234
+ "loss": 0.3591,
1235
+ "step": 1750
1236
+ },
1237
+ {
1238
+ "epoch": 1.0109132682366455,
1239
+ "grad_norm": 7.884782314300537,
1240
+ "learning_rate": 9.896611143021253e-06,
1241
+ "loss": 0.4114,
1242
+ "step": 1760
1243
+ },
1244
+ {
1245
+ "epoch": 1.0166570936243537,
1246
+ "grad_norm": 4.9663567543029785,
1247
+ "learning_rate": 9.83917288914417e-06,
1248
+ "loss": 0.4462,
1249
+ "step": 1770
1250
+ },
1251
+ {
1252
+ "epoch": 1.022400919012062,
1253
+ "grad_norm": 7.046320915222168,
1254
+ "learning_rate": 9.781734635267088e-06,
1255
+ "loss": 0.3386,
1256
+ "step": 1780
1257
+ },
1258
+ {
1259
+ "epoch": 1.0281447443997702,
1260
+ "grad_norm": 6.846945762634277,
1261
+ "learning_rate": 9.724296381390007e-06,
1262
+ "loss": 0.3181,
1263
+ "step": 1790
1264
+ },
1265
+ {
1266
+ "epoch": 1.0338885697874785,
1267
+ "grad_norm": 5.925526142120361,
1268
+ "learning_rate": 9.666858127512925e-06,
1269
+ "loss": 0.3357,
1270
+ "step": 1800
1271
+ },
1272
+ {
1273
+ "epoch": 1.0396323951751867,
1274
+ "grad_norm": 14.302725791931152,
1275
+ "learning_rate": 9.609419873635842e-06,
1276
+ "loss": 0.2859,
1277
+ "step": 1810
1278
+ },
1279
+ {
1280
+ "epoch": 1.045376220562895,
1281
+ "grad_norm": 8.27291488647461,
1282
+ "learning_rate": 9.55198161975876e-06,
1283
+ "loss": 0.3688,
1284
+ "step": 1820
1285
+ },
1286
+ {
1287
+ "epoch": 1.0511200459506032,
1288
+ "grad_norm": 5.266950607299805,
1289
+ "learning_rate": 9.494543365881678e-06,
1290
+ "loss": 0.3106,
1291
+ "step": 1830
1292
+ },
1293
+ {
1294
+ "epoch": 1.0568638713383114,
1295
+ "grad_norm": 11.347005844116211,
1296
+ "learning_rate": 9.437105112004595e-06,
1297
+ "loss": 0.3389,
1298
+ "step": 1840
1299
+ },
1300
+ {
1301
+ "epoch": 1.0626076967260196,
1302
+ "grad_norm": 4.7072906494140625,
1303
+ "learning_rate": 9.379666858127515e-06,
1304
+ "loss": 0.3519,
1305
+ "step": 1850
1306
+ },
1307
+ {
1308
+ "epoch": 1.0683515221137276,
1309
+ "grad_norm": 4.05309534072876,
1310
+ "learning_rate": 9.322228604250432e-06,
1311
+ "loss": 0.3006,
1312
+ "step": 1860
1313
+ },
1314
+ {
1315
+ "epoch": 1.0740953475014359,
1316
+ "grad_norm": 5.578520774841309,
1317
+ "learning_rate": 9.26479035037335e-06,
1318
+ "loss": 0.3645,
1319
+ "step": 1870
1320
+ },
1321
+ {
1322
+ "epoch": 1.079839172889144,
1323
+ "grad_norm": 7.405791282653809,
1324
+ "learning_rate": 9.207352096496266e-06,
1325
+ "loss": 0.3104,
1326
+ "step": 1880
1327
+ },
1328
+ {
1329
+ "epoch": 1.0855829982768523,
1330
+ "grad_norm": 9.269173622131348,
1331
+ "learning_rate": 9.149913842619185e-06,
1332
+ "loss": 0.3576,
1333
+ "step": 1890
1334
+ },
1335
+ {
1336
+ "epoch": 1.0913268236645606,
1337
+ "grad_norm": 5.276297569274902,
1338
+ "learning_rate": 9.092475588742103e-06,
1339
+ "loss": 0.3354,
1340
+ "step": 1900
1341
+ },
1342
+ {
1343
+ "epoch": 1.0970706490522688,
1344
+ "grad_norm": 8.320406913757324,
1345
+ "learning_rate": 9.035037334865021e-06,
1346
+ "loss": 0.3232,
1347
+ "step": 1910
1348
+ },
1349
+ {
1350
+ "epoch": 1.102814474439977,
1351
+ "grad_norm": 6.023215293884277,
1352
+ "learning_rate": 8.977599080987938e-06,
1353
+ "loss": 0.3427,
1354
+ "step": 1920
1355
+ },
1356
+ {
1357
+ "epoch": 1.1085582998276853,
1358
+ "grad_norm": 8.178590774536133,
1359
+ "learning_rate": 8.920160827110856e-06,
1360
+ "loss": 0.3103,
1361
+ "step": 1930
1362
+ },
1363
+ {
1364
+ "epoch": 1.1143021252153935,
1365
+ "grad_norm": 6.056619644165039,
1366
+ "learning_rate": 8.862722573233775e-06,
1367
+ "loss": 0.3848,
1368
+ "step": 1940
1369
+ },
1370
+ {
1371
+ "epoch": 1.1200459506031017,
1372
+ "grad_norm": 6.109485626220703,
1373
+ "learning_rate": 8.805284319356693e-06,
1374
+ "loss": 0.3102,
1375
+ "step": 1950
1376
+ },
1377
+ {
1378
+ "epoch": 1.12578977599081,
1379
+ "grad_norm": 6.949984550476074,
1380
+ "learning_rate": 8.747846065479611e-06,
1381
+ "loss": 0.3013,
1382
+ "step": 1960
1383
+ },
1384
+ {
1385
+ "epoch": 1.1315336013785182,
1386
+ "grad_norm": 4.320880889892578,
1387
+ "learning_rate": 8.690407811602528e-06,
1388
+ "loss": 0.3169,
1389
+ "step": 1970
1390
+ },
1391
+ {
1392
+ "epoch": 1.1372774267662262,
1393
+ "grad_norm": 9.62964916229248,
1394
+ "learning_rate": 8.632969557725446e-06,
1395
+ "loss": 0.3577,
1396
+ "step": 1980
1397
+ },
1398
+ {
1399
+ "epoch": 1.1430212521539345,
1400
+ "grad_norm": 7.1865105628967285,
1401
+ "learning_rate": 8.575531303848363e-06,
1402
+ "loss": 0.4245,
1403
+ "step": 1990
1404
+ },
1405
+ {
1406
+ "epoch": 1.1487650775416427,
1407
+ "grad_norm": 11.42944622039795,
1408
+ "learning_rate": 8.518093049971281e-06,
1409
+ "loss": 0.3421,
1410
+ "step": 2000
1411
+ },
1412
+ {
1413
+ "epoch": 1.154508902929351,
1414
+ "grad_norm": 10.365814208984375,
1415
+ "learning_rate": 8.4606547960942e-06,
1416
+ "loss": 0.2971,
1417
+ "step": 2010
1418
+ },
1419
+ {
1420
+ "epoch": 1.1602527283170592,
1421
+ "grad_norm": 4.546888828277588,
1422
+ "learning_rate": 8.403216542217118e-06,
1423
+ "loss": 0.3762,
1424
+ "step": 2020
1425
+ },
1426
+ {
1427
+ "epoch": 1.1659965537047674,
1428
+ "grad_norm": 9.672823905944824,
1429
+ "learning_rate": 8.345778288340034e-06,
1430
+ "loss": 0.3411,
1431
+ "step": 2030
1432
+ },
1433
+ {
1434
+ "epoch": 1.1717403790924756,
1435
+ "grad_norm": 4.738915920257568,
1436
+ "learning_rate": 8.288340034462953e-06,
1437
+ "loss": 0.3453,
1438
+ "step": 2040
1439
+ },
1440
+ {
1441
+ "epoch": 1.1774842044801839,
1442
+ "grad_norm": 10.187810897827148,
1443
+ "learning_rate": 8.230901780585871e-06,
1444
+ "loss": 0.3351,
1445
+ "step": 2050
1446
+ },
1447
+ {
1448
+ "epoch": 1.183228029867892,
1449
+ "grad_norm": 6.290671348571777,
1450
+ "learning_rate": 8.17346352670879e-06,
1451
+ "loss": 0.3286,
1452
+ "step": 2060
1453
+ },
1454
+ {
1455
+ "epoch": 1.1889718552556001,
1456
+ "grad_norm": 9.14261531829834,
1457
+ "learning_rate": 8.116025272831708e-06,
1458
+ "loss": 0.2878,
1459
+ "step": 2070
1460
+ },
1461
+ {
1462
+ "epoch": 1.1947156806433084,
1463
+ "grad_norm": 7.814758777618408,
1464
+ "learning_rate": 8.058587018954624e-06,
1465
+ "loss": 0.3469,
1466
+ "step": 2080
1467
+ },
1468
+ {
1469
+ "epoch": 1.2004595060310166,
1470
+ "grad_norm": 10.085731506347656,
1471
+ "learning_rate": 8.001148765077543e-06,
1472
+ "loss": 0.3025,
1473
+ "step": 2090
1474
+ },
1475
+ {
1476
+ "epoch": 1.2062033314187248,
1477
+ "grad_norm": 10.734376907348633,
1478
+ "learning_rate": 7.94371051120046e-06,
1479
+ "loss": 0.2442,
1480
+ "step": 2100
1481
+ },
1482
+ {
1483
+ "epoch": 1.211947156806433,
1484
+ "grad_norm": 13.61286735534668,
1485
+ "learning_rate": 7.88627225732338e-06,
1486
+ "loss": 0.3063,
1487
+ "step": 2110
1488
+ },
1489
+ {
1490
+ "epoch": 1.2176909821941413,
1491
+ "grad_norm": 8.572850227355957,
1492
+ "learning_rate": 7.828834003446296e-06,
1493
+ "loss": 0.3863,
1494
+ "step": 2120
1495
+ },
1496
+ {
1497
+ "epoch": 1.2234348075818495,
1498
+ "grad_norm": 6.247170448303223,
1499
+ "learning_rate": 7.771395749569214e-06,
1500
+ "loss": 0.3214,
1501
+ "step": 2130
1502
+ },
1503
+ {
1504
+ "epoch": 1.2291786329695578,
1505
+ "grad_norm": 7.438636779785156,
1506
+ "learning_rate": 7.71395749569213e-06,
1507
+ "loss": 0.381,
1508
+ "step": 2140
1509
+ },
1510
+ {
1511
+ "epoch": 1.234922458357266,
1512
+ "grad_norm": 6.11846399307251,
1513
+ "learning_rate": 7.656519241815049e-06,
1514
+ "loss": 0.3556,
1515
+ "step": 2150
1516
+ },
1517
+ {
1518
+ "epoch": 1.2406662837449742,
1519
+ "grad_norm": 10.697092056274414,
1520
+ "learning_rate": 7.5990809879379666e-06,
1521
+ "loss": 0.3767,
1522
+ "step": 2160
1523
+ },
1524
+ {
1525
+ "epoch": 1.2464101091326825,
1526
+ "grad_norm": 5.3118205070495605,
1527
+ "learning_rate": 7.541642734060886e-06,
1528
+ "loss": 0.3112,
1529
+ "step": 2170
1530
+ },
1531
+ {
1532
+ "epoch": 1.2521539345203907,
1533
+ "grad_norm": 5.907925128936768,
1534
+ "learning_rate": 7.484204480183803e-06,
1535
+ "loss": 0.2833,
1536
+ "step": 2180
1537
+ },
1538
+ {
1539
+ "epoch": 1.2578977599080987,
1540
+ "grad_norm": 7.271302223205566,
1541
+ "learning_rate": 7.426766226306721e-06,
1542
+ "loss": 0.1935,
1543
+ "step": 2190
1544
+ },
1545
+ {
1546
+ "epoch": 1.263641585295807,
1547
+ "grad_norm": 12.389423370361328,
1548
+ "learning_rate": 7.369327972429638e-06,
1549
+ "loss": 0.3946,
1550
+ "step": 2200
1551
+ },
1552
+ {
1553
+ "epoch": 1.2693854106835152,
1554
+ "grad_norm": 9.09422492980957,
1555
+ "learning_rate": 7.3118897185525564e-06,
1556
+ "loss": 0.3191,
1557
+ "step": 2210
1558
+ },
1559
+ {
1560
+ "epoch": 1.2751292360712234,
1561
+ "grad_norm": 8.75156307220459,
1562
+ "learning_rate": 7.254451464675475e-06,
1563
+ "loss": 0.4077,
1564
+ "step": 2220
1565
+ },
1566
+ {
1567
+ "epoch": 1.2808730614589316,
1568
+ "grad_norm": 7.306863784790039,
1569
+ "learning_rate": 7.197013210798392e-06,
1570
+ "loss": 0.3295,
1571
+ "step": 2230
1572
+ },
1573
+ {
1574
+ "epoch": 1.2866168868466399,
1575
+ "grad_norm": 9.715473175048828,
1576
+ "learning_rate": 7.1395749569213105e-06,
1577
+ "loss": 0.4163,
1578
+ "step": 2240
1579
+ },
1580
+ {
1581
+ "epoch": 1.2923607122343481,
1582
+ "grad_norm": 6.315252780914307,
1583
+ "learning_rate": 7.082136703044228e-06,
1584
+ "loss": 0.3817,
1585
+ "step": 2250
1586
+ },
1587
+ {
1588
+ "epoch": 1.2981045376220564,
1589
+ "grad_norm": 8.821859359741211,
1590
+ "learning_rate": 7.0246984491671455e-06,
1591
+ "loss": 0.3265,
1592
+ "step": 2260
1593
+ },
1594
+ {
1595
+ "epoch": 1.3038483630097644,
1596
+ "grad_norm": 6.838233947753906,
1597
+ "learning_rate": 6.967260195290065e-06,
1598
+ "loss": 0.2966,
1599
+ "step": 2270
1600
+ },
1601
+ {
1602
+ "epoch": 1.3095921883974726,
1603
+ "grad_norm": 9.925073623657227,
1604
+ "learning_rate": 6.909821941412982e-06,
1605
+ "loss": 0.4484,
1606
+ "step": 2280
1607
+ },
1608
+ {
1609
+ "epoch": 1.3153360137851808,
1610
+ "grad_norm": 5.026411056518555,
1611
+ "learning_rate": 6.8523836875358996e-06,
1612
+ "loss": 0.4647,
1613
+ "step": 2290
1614
+ },
1615
+ {
1616
+ "epoch": 1.321079839172889,
1617
+ "grad_norm": 4.3956732749938965,
1618
+ "learning_rate": 6.794945433658817e-06,
1619
+ "loss": 0.2968,
1620
+ "step": 2300
1621
+ },
1622
+ {
1623
+ "epoch": 1.3268236645605973,
1624
+ "grad_norm": 6.904971599578857,
1625
+ "learning_rate": 6.7375071797817345e-06,
1626
+ "loss": 0.2888,
1627
+ "step": 2310
1628
+ },
1629
+ {
1630
+ "epoch": 1.3325674899483055,
1631
+ "grad_norm": 3.1684281826019287,
1632
+ "learning_rate": 6.680068925904653e-06,
1633
+ "loss": 0.2975,
1634
+ "step": 2320
1635
+ },
1636
+ {
1637
+ "epoch": 1.3383113153360138,
1638
+ "grad_norm": 7.3333420753479,
1639
+ "learning_rate": 6.622630672027571e-06,
1640
+ "loss": 0.3911,
1641
+ "step": 2330
1642
+ },
1643
+ {
1644
+ "epoch": 1.344055140723722,
1645
+ "grad_norm": 7.822445392608643,
1646
+ "learning_rate": 6.565192418150489e-06,
1647
+ "loss": 0.3199,
1648
+ "step": 2340
1649
+ },
1650
+ {
1651
+ "epoch": 1.3497989661114302,
1652
+ "grad_norm": 9.02872371673584,
1653
+ "learning_rate": 6.507754164273407e-06,
1654
+ "loss": 0.4698,
1655
+ "step": 2350
1656
+ },
1657
+ {
1658
+ "epoch": 1.3555427914991385,
1659
+ "grad_norm": 3.9332520961761475,
1660
+ "learning_rate": 6.450315910396324e-06,
1661
+ "loss": 0.3651,
1662
+ "step": 2360
1663
+ },
1664
+ {
1665
+ "epoch": 1.3612866168868467,
1666
+ "grad_norm": 7.590347766876221,
1667
+ "learning_rate": 6.392877656519242e-06,
1668
+ "loss": 0.3121,
1669
+ "step": 2370
1670
+ },
1671
+ {
1672
+ "epoch": 1.367030442274555,
1673
+ "grad_norm": 8.964584350585938,
1674
+ "learning_rate": 6.335439402642161e-06,
1675
+ "loss": 0.271,
1676
+ "step": 2380
1677
+ },
1678
+ {
1679
+ "epoch": 1.3727742676622632,
1680
+ "grad_norm": 8.058918952941895,
1681
+ "learning_rate": 6.2780011487650785e-06,
1682
+ "loss": 0.3906,
1683
+ "step": 2390
1684
+ },
1685
+ {
1686
+ "epoch": 1.3785180930499714,
1687
+ "grad_norm": 6.742099761962891,
1688
+ "learning_rate": 6.220562894887996e-06,
1689
+ "loss": 0.3072,
1690
+ "step": 2400
1691
+ },
1692
+ {
1693
+ "epoch": 1.3842619184376794,
1694
+ "grad_norm": 5.961569309234619,
1695
+ "learning_rate": 6.1631246410109134e-06,
1696
+ "loss": 0.3673,
1697
+ "step": 2410
1698
+ },
1699
+ {
1700
+ "epoch": 1.3900057438253877,
1701
+ "grad_norm": 9.705893516540527,
1702
+ "learning_rate": 6.105686387133831e-06,
1703
+ "loss": 0.3544,
1704
+ "step": 2420
1705
+ },
1706
+ {
1707
+ "epoch": 1.395749569213096,
1708
+ "grad_norm": 4.435375690460205,
1709
+ "learning_rate": 6.04824813325675e-06,
1710
+ "loss": 0.2372,
1711
+ "step": 2430
1712
+ },
1713
+ {
1714
+ "epoch": 1.4014933946008041,
1715
+ "grad_norm": 5.375720977783203,
1716
+ "learning_rate": 5.9908098793796675e-06,
1717
+ "loss": 0.2264,
1718
+ "step": 2440
1719
+ },
1720
+ {
1721
+ "epoch": 1.4072372199885124,
1722
+ "grad_norm": 5.602358818054199,
1723
+ "learning_rate": 5.933371625502585e-06,
1724
+ "loss": 0.3449,
1725
+ "step": 2450
1726
+ },
1727
+ {
1728
+ "epoch": 1.4129810453762206,
1729
+ "grad_norm": 10.811373710632324,
1730
+ "learning_rate": 5.875933371625503e-06,
1731
+ "loss": 0.3663,
1732
+ "step": 2460
1733
+ },
1734
+ {
1735
+ "epoch": 1.4187248707639288,
1736
+ "grad_norm": 10.196518898010254,
1737
+ "learning_rate": 5.818495117748421e-06,
1738
+ "loss": 0.3546,
1739
+ "step": 2470
1740
+ },
1741
+ {
1742
+ "epoch": 1.424468696151637,
1743
+ "grad_norm": 10.06306266784668,
1744
+ "learning_rate": 5.761056863871339e-06,
1745
+ "loss": 0.3282,
1746
+ "step": 2480
1747
+ },
1748
+ {
1749
+ "epoch": 1.430212521539345,
1750
+ "grad_norm": 4.978325843811035,
1751
+ "learning_rate": 5.703618609994257e-06,
1752
+ "loss": 0.2961,
1753
+ "step": 2490
1754
+ },
1755
+ {
1756
+ "epoch": 1.4359563469270533,
1757
+ "grad_norm": 10.731146812438965,
1758
+ "learning_rate": 5.646180356117175e-06,
1759
+ "loss": 0.3349,
1760
+ "step": 2500
1761
+ },
1762
+ {
1763
+ "epoch": 1.4417001723147616,
1764
+ "grad_norm": 8.913891792297363,
1765
+ "learning_rate": 5.588742102240092e-06,
1766
+ "loss": 0.3131,
1767
+ "step": 2510
1768
+ },
1769
+ {
1770
+ "epoch": 1.4474439977024698,
1771
+ "grad_norm": 5.1745195388793945,
1772
+ "learning_rate": 5.53130384836301e-06,
1773
+ "loss": 0.3842,
1774
+ "step": 2520
1775
+ },
1776
+ {
1777
+ "epoch": 1.453187823090178,
1778
+ "grad_norm": 8.361491203308105,
1779
+ "learning_rate": 5.473865594485927e-06,
1780
+ "loss": 0.3598,
1781
+ "step": 2530
1782
+ },
1783
+ {
1784
+ "epoch": 1.4589316484778863,
1785
+ "grad_norm": 6.487078666687012,
1786
+ "learning_rate": 5.4164273406088464e-06,
1787
+ "loss": 0.3244,
1788
+ "step": 2540
1789
+ },
1790
+ {
1791
+ "epoch": 1.4646754738655945,
1792
+ "grad_norm": 4.129726409912109,
1793
+ "learning_rate": 5.358989086731764e-06,
1794
+ "loss": 0.3152,
1795
+ "step": 2550
1796
+ },
1797
+ {
1798
+ "epoch": 1.4704192992533027,
1799
+ "grad_norm": 9.363592147827148,
1800
+ "learning_rate": 5.301550832854681e-06,
1801
+ "loss": 0.3321,
1802
+ "step": 2560
1803
+ },
1804
+ {
1805
+ "epoch": 1.476163124641011,
1806
+ "grad_norm": 6.334773063659668,
1807
+ "learning_rate": 5.2441125789776e-06,
1808
+ "loss": 0.3312,
1809
+ "step": 2570
1810
+ },
1811
+ {
1812
+ "epoch": 1.4819069500287192,
1813
+ "grad_norm": 7.404930114746094,
1814
+ "learning_rate": 5.186674325100517e-06,
1815
+ "loss": 0.3739,
1816
+ "step": 2580
1817
+ },
1818
+ {
1819
+ "epoch": 1.4876507754164274,
1820
+ "grad_norm": 7.487016201019287,
1821
+ "learning_rate": 5.1292360712234355e-06,
1822
+ "loss": 0.2865,
1823
+ "step": 2590
1824
+ },
1825
+ {
1826
+ "epoch": 1.4933946008041357,
1827
+ "grad_norm": 13.322307586669922,
1828
+ "learning_rate": 5.071797817346353e-06,
1829
+ "loss": 0.3444,
1830
+ "step": 2600
1831
+ },
1832
+ {
1833
+ "epoch": 1.499138426191844,
1834
+ "grad_norm": 9.053878784179688,
1835
+ "learning_rate": 5.014359563469271e-06,
1836
+ "loss": 0.3799,
1837
+ "step": 2610
1838
+ },
1839
+ {
1840
+ "epoch": 1.5048822515795521,
1841
+ "grad_norm": 4.018943786621094,
1842
+ "learning_rate": 4.956921309592189e-06,
1843
+ "loss": 0.2567,
1844
+ "step": 2620
1845
+ },
1846
+ {
1847
+ "epoch": 1.5106260769672601,
1848
+ "grad_norm": 2.2457354068756104,
1849
+ "learning_rate": 4.899483055715107e-06,
1850
+ "loss": 0.2978,
1851
+ "step": 2630
1852
+ },
1853
+ {
1854
+ "epoch": 1.5163699023549684,
1855
+ "grad_norm": 4.894889831542969,
1856
+ "learning_rate": 4.8420448018380245e-06,
1857
+ "loss": 0.2861,
1858
+ "step": 2640
1859
+ },
1860
+ {
1861
+ "epoch": 1.5221137277426766,
1862
+ "grad_norm": 6.843629360198975,
1863
+ "learning_rate": 4.784606547960942e-06,
1864
+ "loss": 0.3254,
1865
+ "step": 2650
1866
+ },
1867
+ {
1868
+ "epoch": 1.5278575531303848,
1869
+ "grad_norm": 7.173573970794678,
1870
+ "learning_rate": 4.72716829408386e-06,
1871
+ "loss": 0.3481,
1872
+ "step": 2660
1873
+ },
1874
+ {
1875
+ "epoch": 1.533601378518093,
1876
+ "grad_norm": 12.328543663024902,
1877
+ "learning_rate": 4.669730040206778e-06,
1878
+ "loss": 0.3124,
1879
+ "step": 2670
1880
+ },
1881
+ {
1882
+ "epoch": 1.5393452039058013,
1883
+ "grad_norm": 9.337592124938965,
1884
+ "learning_rate": 4.612291786329696e-06,
1885
+ "loss": 0.3005,
1886
+ "step": 2680
1887
+ },
1888
+ {
1889
+ "epoch": 1.5450890292935093,
1890
+ "grad_norm": 5.477969646453857,
1891
+ "learning_rate": 4.5548535324526135e-06,
1892
+ "loss": 0.3457,
1893
+ "step": 2690
1894
+ },
1895
+ {
1896
+ "epoch": 1.5508328546812176,
1897
+ "grad_norm": 5.083920955657959,
1898
+ "learning_rate": 4.497415278575532e-06,
1899
+ "loss": 0.2858,
1900
+ "step": 2700
1901
+ },
1902
+ {
1903
+ "epoch": 1.5565766800689258,
1904
+ "grad_norm": 6.250855445861816,
1905
+ "learning_rate": 4.439977024698449e-06,
1906
+ "loss": 0.2673,
1907
+ "step": 2710
1908
+ },
1909
+ {
1910
+ "epoch": 1.562320505456634,
1911
+ "grad_norm": 6.169952392578125,
1912
+ "learning_rate": 4.382538770821368e-06,
1913
+ "loss": 0.2971,
1914
+ "step": 2720
1915
+ },
1916
+ {
1917
+ "epoch": 1.5680643308443423,
1918
+ "grad_norm": 8.261754989624023,
1919
+ "learning_rate": 4.325100516944285e-06,
1920
+ "loss": 0.3009,
1921
+ "step": 2730
1922
+ },
1923
+ {
1924
+ "epoch": 1.5738081562320505,
1925
+ "grad_norm": 8.477384567260742,
1926
+ "learning_rate": 4.267662263067203e-06,
1927
+ "loss": 0.3017,
1928
+ "step": 2740
1929
+ },
1930
+ {
1931
+ "epoch": 1.5795519816197587,
1932
+ "grad_norm": 8.52374267578125,
1933
+ "learning_rate": 4.210224009190121e-06,
1934
+ "loss": 0.4511,
1935
+ "step": 2750
1936
+ },
1937
+ {
1938
+ "epoch": 1.585295807007467,
1939
+ "grad_norm": 8.646858215332031,
1940
+ "learning_rate": 4.152785755313039e-06,
1941
+ "loss": 0.2699,
1942
+ "step": 2760
1943
+ },
1944
+ {
1945
+ "epoch": 1.5910396323951752,
1946
+ "grad_norm": 7.500174522399902,
1947
+ "learning_rate": 4.095347501435957e-06,
1948
+ "loss": 0.2743,
1949
+ "step": 2770
1950
+ },
1951
+ {
1952
+ "epoch": 1.5967834577828834,
1953
+ "grad_norm": 5.454465389251709,
1954
+ "learning_rate": 4.037909247558874e-06,
1955
+ "loss": 0.3023,
1956
+ "step": 2780
1957
+ },
1958
+ {
1959
+ "epoch": 1.6025272831705917,
1960
+ "grad_norm": 2.6998019218444824,
1961
+ "learning_rate": 3.9804709936817925e-06,
1962
+ "loss": 0.3941,
1963
+ "step": 2790
1964
+ },
1965
+ {
1966
+ "epoch": 1.6082711085583,
1967
+ "grad_norm": 4.594570159912109,
1968
+ "learning_rate": 3.92303273980471e-06,
1969
+ "loss": 0.252,
1970
+ "step": 2800
1971
+ },
1972
+ {
1973
+ "epoch": 1.6140149339460081,
1974
+ "grad_norm": 5.87538480758667,
1975
+ "learning_rate": 3.865594485927628e-06,
1976
+ "loss": 0.3093,
1977
+ "step": 2810
1978
+ },
1979
+ {
1980
+ "epoch": 1.6197587593337164,
1981
+ "grad_norm": 5.358250617980957,
1982
+ "learning_rate": 3.808156232050546e-06,
1983
+ "loss": 0.3674,
1984
+ "step": 2820
1985
+ },
1986
+ {
1987
+ "epoch": 1.6255025847214246,
1988
+ "grad_norm": 4.871222972869873,
1989
+ "learning_rate": 3.7507179781734636e-06,
1990
+ "loss": 0.2407,
1991
+ "step": 2830
1992
+ },
1993
+ {
1994
+ "epoch": 1.6312464101091326,
1995
+ "grad_norm": 6.6836838722229,
1996
+ "learning_rate": 3.693279724296382e-06,
1997
+ "loss": 0.2978,
1998
+ "step": 2840
1999
+ },
2000
+ {
2001
+ "epoch": 1.6369902354968409,
2002
+ "grad_norm": 7.67780065536499,
2003
+ "learning_rate": 3.6358414704192994e-06,
2004
+ "loss": 0.3655,
2005
+ "step": 2850
2006
+ },
2007
+ {
2008
+ "epoch": 1.642734060884549,
2009
+ "grad_norm": 6.890137672424316,
2010
+ "learning_rate": 3.5784032165422173e-06,
2011
+ "loss": 0.3618,
2012
+ "step": 2860
2013
+ },
2014
+ {
2015
+ "epoch": 1.6484778862722573,
2016
+ "grad_norm": 7.769665241241455,
2017
+ "learning_rate": 3.5209649626651356e-06,
2018
+ "loss": 0.2703,
2019
+ "step": 2870
2020
+ },
2021
+ {
2022
+ "epoch": 1.6542217116599656,
2023
+ "grad_norm": 4.888299465179443,
2024
+ "learning_rate": 3.463526708788053e-06,
2025
+ "loss": 0.3257,
2026
+ "step": 2880
2027
+ },
2028
+ {
2029
+ "epoch": 1.6599655370476738,
2030
+ "grad_norm": 4.726266384124756,
2031
+ "learning_rate": 3.406088454910971e-06,
2032
+ "loss": 0.3573,
2033
+ "step": 2890
2034
+ },
2035
+ {
2036
+ "epoch": 1.6657093624353818,
2037
+ "grad_norm": 6.836297035217285,
2038
+ "learning_rate": 3.348650201033889e-06,
2039
+ "loss": 0.3325,
2040
+ "step": 2900
2041
+ },
2042
+ {
2043
+ "epoch": 1.67145318782309,
2044
+ "grad_norm": 6.571508884429932,
2045
+ "learning_rate": 3.2912119471568067e-06,
2046
+ "loss": 0.292,
2047
+ "step": 2910
2048
+ },
2049
+ {
2050
+ "epoch": 1.6771970132107983,
2051
+ "grad_norm": 9.843769073486328,
2052
+ "learning_rate": 3.2337736932797246e-06,
2053
+ "loss": 0.4027,
2054
+ "step": 2920
2055
+ },
2056
+ {
2057
+ "epoch": 1.6829408385985065,
2058
+ "grad_norm": 6.089911937713623,
2059
+ "learning_rate": 3.1763354394026425e-06,
2060
+ "loss": 0.3627,
2061
+ "step": 2930
2062
+ },
2063
+ {
2064
+ "epoch": 1.6886846639862148,
2065
+ "grad_norm": 6.846927165985107,
2066
+ "learning_rate": 3.11889718552556e-06,
2067
+ "loss": 0.3541,
2068
+ "step": 2940
2069
+ },
2070
+ {
2071
+ "epoch": 1.694428489373923,
2072
+ "grad_norm": 10.23181438446045,
2073
+ "learning_rate": 3.0614589316484783e-06,
2074
+ "loss": 0.3418,
2075
+ "step": 2950
2076
+ },
2077
+ {
2078
+ "epoch": 1.7001723147616312,
2079
+ "grad_norm": 5.403523921966553,
2080
+ "learning_rate": 3.0040206777713958e-06,
2081
+ "loss": 0.3498,
2082
+ "step": 2960
2083
+ },
2084
+ {
2085
+ "epoch": 1.7059161401493395,
2086
+ "grad_norm": 8.252917289733887,
2087
+ "learning_rate": 2.9465824238943137e-06,
2088
+ "loss": 0.3401,
2089
+ "step": 2970
2090
+ },
2091
+ {
2092
+ "epoch": 1.7116599655370477,
2093
+ "grad_norm": 7.06523323059082,
2094
+ "learning_rate": 2.889144170017232e-06,
2095
+ "loss": 0.2331,
2096
+ "step": 2980
2097
+ },
2098
+ {
2099
+ "epoch": 1.717403790924756,
2100
+ "grad_norm": 7.739984035491943,
2101
+ "learning_rate": 2.8317059161401494e-06,
2102
+ "loss": 0.3678,
2103
+ "step": 2990
2104
+ },
2105
+ {
2106
+ "epoch": 1.7231476163124642,
2107
+ "grad_norm": 4.021157741546631,
2108
+ "learning_rate": 2.7742676622630677e-06,
2109
+ "loss": 0.3537,
2110
+ "step": 3000
2111
+ },
2112
+ {
2113
+ "epoch": 1.7288914417001724,
2114
+ "grad_norm": 9.555063247680664,
2115
+ "learning_rate": 2.7168294083859852e-06,
2116
+ "loss": 0.2955,
2117
+ "step": 3010
2118
+ },
2119
+ {
2120
+ "epoch": 1.7346352670878806,
2121
+ "grad_norm": 6.779582500457764,
2122
+ "learning_rate": 2.659391154508903e-06,
2123
+ "loss": 0.3139,
2124
+ "step": 3020
2125
+ },
2126
+ {
2127
+ "epoch": 1.7403790924755889,
2128
+ "grad_norm": 2.2541654109954834,
2129
+ "learning_rate": 2.601952900631821e-06,
2130
+ "loss": 0.2643,
2131
+ "step": 3030
2132
+ },
2133
+ {
2134
+ "epoch": 1.746122917863297,
2135
+ "grad_norm": 6.457112789154053,
2136
+ "learning_rate": 2.544514646754739e-06,
2137
+ "loss": 0.328,
2138
+ "step": 3040
2139
+ },
2140
+ {
2141
+ "epoch": 1.7518667432510053,
2142
+ "grad_norm": 11.173613548278809,
2143
+ "learning_rate": 2.4870763928776568e-06,
2144
+ "loss": 0.2807,
2145
+ "step": 3050
2146
+ },
2147
+ {
2148
+ "epoch": 1.7576105686387133,
2149
+ "grad_norm": 7.816757678985596,
2150
+ "learning_rate": 2.4296381390005747e-06,
2151
+ "loss": 0.3215,
2152
+ "step": 3060
2153
+ },
2154
+ {
2155
+ "epoch": 1.7633543940264216,
2156
+ "grad_norm": 9.915975570678711,
2157
+ "learning_rate": 2.372199885123492e-06,
2158
+ "loss": 0.4274,
2159
+ "step": 3070
2160
+ },
2161
+ {
2162
+ "epoch": 1.7690982194141298,
2163
+ "grad_norm": 4.84187650680542,
2164
+ "learning_rate": 2.31476163124641e-06,
2165
+ "loss": 0.2679,
2166
+ "step": 3080
2167
+ },
2168
+ {
2169
+ "epoch": 1.774842044801838,
2170
+ "grad_norm": 7.7299909591674805,
2171
+ "learning_rate": 2.2573233773693283e-06,
2172
+ "loss": 0.37,
2173
+ "step": 3090
2174
+ },
2175
+ {
2176
+ "epoch": 1.7805858701895463,
2177
+ "grad_norm": 6.248807430267334,
2178
+ "learning_rate": 2.1998851234922462e-06,
2179
+ "loss": 0.3418,
2180
+ "step": 3100
2181
+ },
2182
+ {
2183
+ "epoch": 1.7863296955772543,
2184
+ "grad_norm": 7.012147426605225,
2185
+ "learning_rate": 2.1424468696151637e-06,
2186
+ "loss": 0.3138,
2187
+ "step": 3110
2188
+ },
2189
+ {
2190
+ "epoch": 1.7920735209649625,
2191
+ "grad_norm": 4.839092254638672,
2192
+ "learning_rate": 2.0850086157380816e-06,
2193
+ "loss": 0.3464,
2194
+ "step": 3120
2195
+ },
2196
+ {
2197
+ "epoch": 1.7978173463526708,
2198
+ "grad_norm": 9.893400192260742,
2199
+ "learning_rate": 2.0275703618609995e-06,
2200
+ "loss": 0.315,
2201
+ "step": 3130
2202
+ },
2203
+ {
2204
+ "epoch": 1.803561171740379,
2205
+ "grad_norm": 4.713288307189941,
2206
+ "learning_rate": 1.9701321079839174e-06,
2207
+ "loss": 0.2864,
2208
+ "step": 3140
2209
+ },
2210
+ {
2211
+ "epoch": 1.8093049971280872,
2212
+ "grad_norm": 7.380845546722412,
2213
+ "learning_rate": 1.9126938541068353e-06,
2214
+ "loss": 0.3599,
2215
+ "step": 3150
2216
+ },
2217
+ {
2218
+ "epoch": 1.8150488225157955,
2219
+ "grad_norm": 8.279121398925781,
2220
+ "learning_rate": 1.8552556002297532e-06,
2221
+ "loss": 0.2871,
2222
+ "step": 3160
2223
+ },
2224
+ {
2225
+ "epoch": 1.8207926479035037,
2226
+ "grad_norm": 8.517400741577148,
2227
+ "learning_rate": 1.797817346352671e-06,
2228
+ "loss": 0.2782,
2229
+ "step": 3170
2230
+ },
2231
+ {
2232
+ "epoch": 1.826536473291212,
2233
+ "grad_norm": 8.836138725280762,
2234
+ "learning_rate": 1.740379092475589e-06,
2235
+ "loss": 0.3053,
2236
+ "step": 3180
2237
+ },
2238
+ {
2239
+ "epoch": 1.8322802986789202,
2240
+ "grad_norm": 8.175244331359863,
2241
+ "learning_rate": 1.6829408385985068e-06,
2242
+ "loss": 0.2729,
2243
+ "step": 3190
2244
+ },
2245
+ {
2246
+ "epoch": 1.8380241240666284,
2247
+ "grad_norm": 8.452549934387207,
2248
+ "learning_rate": 1.6255025847214245e-06,
2249
+ "loss": 0.3369,
2250
+ "step": 3200
2251
+ },
2252
+ {
2253
+ "epoch": 1.8437679494543366,
2254
+ "grad_norm": 4.472502708435059,
2255
+ "learning_rate": 1.5680643308443424e-06,
2256
+ "loss": 0.3386,
2257
+ "step": 3210
2258
+ },
2259
+ {
2260
+ "epoch": 1.8495117748420449,
2261
+ "grad_norm": 6.361071586608887,
2262
+ "learning_rate": 1.5106260769672603e-06,
2263
+ "loss": 0.2744,
2264
+ "step": 3220
2265
+ },
2266
+ {
2267
+ "epoch": 1.855255600229753,
2268
+ "grad_norm": 5.243939399719238,
2269
+ "learning_rate": 1.4531878230901784e-06,
2270
+ "loss": 0.263,
2271
+ "step": 3230
2272
+ },
2273
+ {
2274
+ "epoch": 1.8609994256174613,
2275
+ "grad_norm": 11.582174301147461,
2276
+ "learning_rate": 1.3957495692130959e-06,
2277
+ "loss": 0.4315,
2278
+ "step": 3240
2279
+ },
2280
+ {
2281
+ "epoch": 1.8667432510051696,
2282
+ "grad_norm": 6.288567543029785,
2283
+ "learning_rate": 1.338311315336014e-06,
2284
+ "loss": 0.339,
2285
+ "step": 3250
2286
+ },
2287
+ {
2288
+ "epoch": 1.8724870763928778,
2289
+ "grad_norm": 12.038402557373047,
2290
+ "learning_rate": 1.2808730614589319e-06,
2291
+ "loss": 0.2851,
2292
+ "step": 3260
2293
+ },
2294
+ {
2295
+ "epoch": 1.8782309017805858,
2296
+ "grad_norm": 6.310272216796875,
2297
+ "learning_rate": 1.2234348075818495e-06,
2298
+ "loss": 0.2127,
2299
+ "step": 3270
2300
+ },
2301
+ {
2302
+ "epoch": 1.883974727168294,
2303
+ "grad_norm": 7.386268615722656,
2304
+ "learning_rate": 1.1659965537047674e-06,
2305
+ "loss": 0.3338,
2306
+ "step": 3280
2307
+ },
2308
+ {
2309
+ "epoch": 1.8897185525560023,
2310
+ "grad_norm": 7.40879487991333,
2311
+ "learning_rate": 1.1085582998276853e-06,
2312
+ "loss": 0.2658,
2313
+ "step": 3290
2314
+ },
2315
+ {
2316
+ "epoch": 1.8954623779437105,
2317
+ "grad_norm": 13.936086654663086,
2318
+ "learning_rate": 1.0511200459506032e-06,
2319
+ "loss": 0.3539,
2320
+ "step": 3300
2321
+ },
2322
+ {
2323
+ "epoch": 1.9012062033314188,
2324
+ "grad_norm": 5.246440410614014,
2325
+ "learning_rate": 9.936817920735211e-07,
2326
+ "loss": 0.3869,
2327
+ "step": 3310
2328
+ },
2329
+ {
2330
+ "epoch": 1.9069500287191268,
2331
+ "grad_norm": 7.692013740539551,
2332
+ "learning_rate": 9.36243538196439e-07,
2333
+ "loss": 0.2657,
2334
+ "step": 3320
2335
+ },
2336
+ {
2337
+ "epoch": 1.912693854106835,
2338
+ "grad_norm": 10.840241432189941,
2339
+ "learning_rate": 8.788052843193568e-07,
2340
+ "loss": 0.3165,
2341
+ "step": 3330
2342
+ },
2343
+ {
2344
+ "epoch": 1.9184376794945432,
2345
+ "grad_norm": 5.2672810554504395,
2346
+ "learning_rate": 8.213670304422747e-07,
2347
+ "loss": 0.2489,
2348
+ "step": 3340
2349
+ },
2350
+ {
2351
+ "epoch": 1.9241815048822515,
2352
+ "grad_norm": 3.8065874576568604,
2353
+ "learning_rate": 7.639287765651925e-07,
2354
+ "loss": 0.3271,
2355
+ "step": 3350
2356
+ },
2357
+ {
2358
+ "epoch": 1.9299253302699597,
2359
+ "grad_norm": 10.850006103515625,
2360
+ "learning_rate": 7.064905226881104e-07,
2361
+ "loss": 0.3887,
2362
+ "step": 3360
2363
+ },
2364
+ {
2365
+ "epoch": 1.935669155657668,
2366
+ "grad_norm": 4.718282699584961,
2367
+ "learning_rate": 6.490522688110281e-07,
2368
+ "loss": 0.3125,
2369
+ "step": 3370
2370
+ },
2371
+ {
2372
+ "epoch": 1.9414129810453762,
2373
+ "grad_norm": 5.016386032104492,
2374
+ "learning_rate": 5.91614014933946e-07,
2375
+ "loss": 0.3018,
2376
+ "step": 3380
2377
+ },
2378
+ {
2379
+ "epoch": 1.9471568064330844,
2380
+ "grad_norm": 11.608333587646484,
2381
+ "learning_rate": 5.341757610568639e-07,
2382
+ "loss": 0.2911,
2383
+ "step": 3390
2384
+ },
2385
+ {
2386
+ "epoch": 1.9529006318207927,
2387
+ "grad_norm": 7.1915459632873535,
2388
+ "learning_rate": 4.7673750717978177e-07,
2389
+ "loss": 0.2974,
2390
+ "step": 3400
2391
+ },
2392
+ {
2393
+ "epoch": 1.9586444572085009,
2394
+ "grad_norm": 9.097732543945312,
2395
+ "learning_rate": 4.192992533026996e-07,
2396
+ "loss": 0.3127,
2397
+ "step": 3410
2398
+ },
2399
+ {
2400
+ "epoch": 1.9643882825962091,
2401
+ "grad_norm": 5.897636890411377,
2402
+ "learning_rate": 3.6186099942561744e-07,
2403
+ "loss": 0.3128,
2404
+ "step": 3420
2405
+ },
2406
+ {
2407
+ "epoch": 1.9701321079839174,
2408
+ "grad_norm": 5.362806797027588,
2409
+ "learning_rate": 3.0442274554853533e-07,
2410
+ "loss": 0.2742,
2411
+ "step": 3430
2412
+ },
2413
+ {
2414
+ "epoch": 1.9758759333716256,
2415
+ "grad_norm": 8.615423202514648,
2416
+ "learning_rate": 2.469844916714532e-07,
2417
+ "loss": 0.3787,
2418
+ "step": 3440
2419
+ },
2420
+ {
2421
+ "epoch": 1.9816197587593338,
2422
+ "grad_norm": 8.756539344787598,
2423
+ "learning_rate": 1.8954623779437106e-07,
2424
+ "loss": 0.3705,
2425
+ "step": 3450
2426
+ },
2427
+ {
2428
+ "epoch": 1.987363584147042,
2429
+ "grad_norm": 11.84693717956543,
2430
+ "learning_rate": 1.3210798391728893e-07,
2431
+ "loss": 0.3027,
2432
+ "step": 3460
2433
+ },
2434
+ {
2435
+ "epoch": 1.9931074095347503,
2436
+ "grad_norm": 11.413233757019043,
2437
+ "learning_rate": 7.466973004020678e-08,
2438
+ "loss": 0.3549,
2439
+ "step": 3470
2440
+ },
2441
+ {
2442
+ "epoch": 1.9988512349224583,
2443
+ "grad_norm": 5.79984188079834,
2444
+ "learning_rate": 1.7231476163124642e-08,
2445
+ "loss": 0.3109,
2446
+ "step": 3480
2447
+ }
2448
+ ],
2449
+ "logging_steps": 10,
2450
+ "max_steps": 3482,
2451
+ "num_input_tokens_seen": 0,
2452
+ "num_train_epochs": 2,
2453
+ "save_steps": 500,
2454
+ "stateful_callbacks": {
2455
+ "TrainerControl": {
2456
+ "args": {
2457
+ "should_epoch_stop": false,
2458
+ "should_evaluate": false,
2459
+ "should_log": false,
2460
+ "should_save": true,
2461
+ "should_training_stop": true
2462
+ },
2463
+ "attributes": {}
2464
+ }
2465
+ },
2466
+ "total_flos": 7376844496355328.0,
2467
+ "train_batch_size": 16,
2468
+ "trial_name": null,
2469
+ "trial_params": null
2470
+ }
checkpoint-3482/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e7cb6431ba79d66c2c9c534739c31af31901b4f69b86ef7cd6fa47d58097e4a
3
+ size 5368
checkpoint-3482/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.3,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "problem_type": "single_label_classification",
18
+ "qa_dropout": 0.1,
19
+ "seq_classif_dropout": 0.2,
20
+ "sinusoidal_pos_embds": false,
21
+ "tie_weights_": true,
22
+ "transformers_version": "4.57.1",
23
+ "vocab_size": 30522
24
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44a9699f5eae1c12be68d7d6783b3db0d0136ec5a126bd85d7435845fcefef10
3
+ size 267832560
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff