bobox commited on
Commit
472b81e
1 Parent(s): 5deaae0

Training in progress, step 305, checkpoint

Browse files
checkpoint-305/1_AdvancedWeightedPooling/config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "embed_dim": 768,
3
+ "num_heads": 4,
4
+ "dropout": 0.05,
5
+ "bias": true,
6
+ "gate_min": 0.1,
7
+ "gate_max": 0.9,
8
+ "gate_dropout": 0.1,
9
+ "dropout_gate_open": 0.05,
10
+ "dropout_gate_close": 0.05,
11
+ "CLS_self_attn": 0
12
+ }
checkpoint-305/1_AdvancedWeightedPooling/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:431d905931e29442c165bd526667d99310cd89a28b0f15a6b3f5c174b5ac4946
3
+ size 18937587
checkpoint-305/README.md ADDED
@@ -0,0 +1,1180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/deberta-v3-small
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - pearson_cosine
6
+ - spearman_cosine
7
+ - pearson_manhattan
8
+ - spearman_manhattan
9
+ - pearson_euclidean
10
+ - spearman_euclidean
11
+ - pearson_dot
12
+ - spearman_dot
13
+ - pearson_max
14
+ - spearman_max
15
+ - cosine_accuracy
16
+ - cosine_accuracy_threshold
17
+ - cosine_f1
18
+ - cosine_f1_threshold
19
+ - cosine_precision
20
+ - cosine_recall
21
+ - cosine_ap
22
+ - dot_accuracy
23
+ - dot_accuracy_threshold
24
+ - dot_f1
25
+ - dot_f1_threshold
26
+ - dot_precision
27
+ - dot_recall
28
+ - dot_ap
29
+ - manhattan_accuracy
30
+ - manhattan_accuracy_threshold
31
+ - manhattan_f1
32
+ - manhattan_f1_threshold
33
+ - manhattan_precision
34
+ - manhattan_recall
35
+ - manhattan_ap
36
+ - euclidean_accuracy
37
+ - euclidean_accuracy_threshold
38
+ - euclidean_f1
39
+ - euclidean_f1_threshold
40
+ - euclidean_precision
41
+ - euclidean_recall
42
+ - euclidean_ap
43
+ - max_accuracy
44
+ - max_accuracy_threshold
45
+ - max_f1
46
+ - max_f1_threshold
47
+ - max_precision
48
+ - max_recall
49
+ - max_ap
50
+ pipeline_tag: sentence-similarity
51
+ tags:
52
+ - sentence-transformers
53
+ - sentence-similarity
54
+ - feature-extraction
55
+ - generated_from_trainer
56
+ - dataset_size:32500
57
+ - loss:GISTEmbedLoss
58
+ widget:
59
+ - source_sentence: A picture of a white gas range with figurines above.
60
+ sentences:
61
+ - A nerdy woman brushing her teeth with a friend nearby.
62
+ - a white stove turned off with a digital clock
63
+ - The plasma membrane also contains other molecules, primarily other lipids and
64
+ proteins. The green molecules in Figure above , for example, are the lipid cholesterol.
65
+ Molecules of cholesterol help the plasma membrane keep its shape. Many of the
66
+ proteins in the plasma membrane assist other substances in crossing the membrane.
67
+ - source_sentence: who makes the kentucky derby garland of roses
68
+ sentences:
69
+ - Accrington strengthened their position in the play-off places with a hard-fought
70
+ win over struggling Dagenham.
71
+ - "tidal energy can be used to produce electricity. Ocean thermal is energy derived\
72
+ \ from waves and also from tidal waves. \n Ocean thermal energy can be used to\
73
+ \ produce electricity."
74
+ - Kentucky Derby Trophy The Kroger Company has been the official florist of the
75
+ Kentucky Derby since 1987. After taking over the duties from the Kingsley Walker
76
+ florist, Kroger began constructing the prestigious garland in one of its local
77
+ stores for the public to view on Derby Eve. The preservation of the garland and
78
+ crowds of spectators watching its construction are a testament to the prestige
79
+ and mystique of the Garland of Roses.
80
+ - source_sentence: what is the difference between a general sense and a special sense?
81
+ sentences:
82
+ - 'Ian Curtis ( of Touching from a distance) Ian Kevin Curtis was an English musician
83
+ and singer-songwriter. He is best known as the lead singer and lyricist of the
84
+ post-punk band Joy Division. Joy Division released its debut album, Unknown Pleasures,
85
+ in 1979 and recorded its follow-up, Closer, in 1980. Curtis, who suffered from
86
+ epilepsy and depression, committed suicide on 18 May 1980, on the eve of Joy Division''s
87
+ first North American tour, resulting in the band''s dissolution and the subsequent
88
+ formation of New Order. Curtis was known for his baritone voice, dance style,
89
+ and songwriting filled with imagery of desolation, emptiness and alienation. In
90
+ 1995, Curtis''s widow Deborah published Touching from a Distance: Ian Curtis and
91
+ Joy Division, a biography of the singer. His life and death Ian Kevin Curtis was
92
+ an English musician and singer-songwriter. He is best known as the lead singer
93
+ and lyricist of the post-punk band Joy Division. Joy Division released its debut
94
+ album, Unknown Pleasures, in 1979 and recorded its follow-up, Closer, in 1980.
95
+ Curtis, who suffered from epilepsy and depression, committed suicide on 18 May
96
+ 1980, on the eve of Joy Division''s first North American tour, resulting in the
97
+ band''s dissolution and the subsequent formation of New Order. Curtis was known
98
+ for his baritone voice, dance style, and songwriting filled with imagery of desolation,
99
+ emptiness and alienation. In 1995, Curtis''s widow Deborah published Touching
100
+ from a Distance: Ian Curtis and Joy Division, a biography of the singer. His life
101
+ and death have been dramatised in the films 24 Hour Party People (2002) and Control
102
+ (2007). ...more'
103
+ - The human body has two basic types of senses, called special senses and general
104
+ senses. Special senses have specialized sense organs that gather sensory information
105
+ and change it into nerve impulses. ... General senses, in contrast, are all associated
106
+ with the sense of touch. They lack special sense organs.
107
+ - Captain Hook Barrie states in the novel that "Hook was not his true name. To reveal
108
+ who he really was would even at this date set the country in a blaze", and relates
109
+ that Peter Pan began their rivalry by feeding the pirate's hand to the crocodile.
110
+ He is said to be "Blackbeard's bo'sun" and "the only man of whom Barbecue was
111
+ afraid".[5] (In Robert Louis Stevenson's Treasure Island, one of the names Long
112
+ John Silver goes by is Barbecue.)[6]
113
+ - source_sentence: Retzius was born in Stockholm , son of the anatomist Anders Jahan
114
+ Retzius ( and grandson of the naturalist and chemist Anders Retzius ) .
115
+ sentences:
116
+ - Retzius was born in Stockholm , the son of anatomist Anders Jahan Retzius ( and
117
+ grandson of the naturalist and chemist Anders Retzius ) .
118
+ - As of 14 March , over 156,000 cases of COVID-19 have been reported in around 140
119
+ countries and territories ; more than 5,800 people have died from the disease
120
+ and around 75,000 have recovered .
121
+ - A person sitting on a stool on the street.
122
+ - source_sentence: who was the first person who made the violin
123
+ sentences:
124
+ - Alice in Chains Alice in Chains is an American rock band from Seattle, Washington,
125
+ formed in 1987 by guitarist and vocalist Jerry Cantrell and drummer Sean Kinney,[1]
126
+ who recruited bassist Mike Starr[1] and lead vocalist Layne Staley.[1][2][3] Starr
127
+ was replaced by Mike Inez in 1993.[4] After Staley's death in 2002, William DuVall
128
+ joined in 2006 as co-lead vocalist and rhythm guitarist. The band took its name
129
+ from Staley's previous group, the glam metal band Alice N' Chains.[5][2]
130
+ - as distance from an object decreases , that object will appear larger
131
+ - Violin The first makers of violins probably borrowed from various developments
132
+ of the Byzantine lira. These included the rebec;[13] the Arabic rebab; the vielle
133
+ (also known as the fidel or viuola); and the lira da braccio[11][14] The violin
134
+ in its present form emerged in early 16th-century northern Italy. The earliest
135
+ pictures of violins, albeit with three strings, are seen in northern Italy around
136
+ 1530, at around the same time as the words "violino" and "vyollon" are seen in
137
+ Italian and French documents. One of the earliest explicit descriptions of the
138
+ instrument, including its tuning, is from the Epitome musical by Jambe de Fer,
139
+ published in Lyon in 1556.[15] By this time, the violin had already begun to spread
140
+ throughout Europe.
141
+ model-index:
142
+ - name: SentenceTransformer based on microsoft/deberta-v3-small
143
+ results:
144
+ - task:
145
+ type: semantic-similarity
146
+ name: Semantic Similarity
147
+ dataset:
148
+ name: sts test
149
+ type: sts-test
150
+ metrics:
151
+ - type: pearson_cosine
152
+ value: 0.1561600438268545
153
+ name: Pearson Cosine
154
+ - type: spearman_cosine
155
+ value: 0.22356441354815124
156
+ name: Spearman Cosine
157
+ - type: pearson_manhattan
158
+ value: 0.2216924674035587
159
+ name: Pearson Manhattan
160
+ - type: spearman_manhattan
161
+ value: 0.24997065610359018
162
+ name: Spearman Manhattan
163
+ - type: pearson_euclidean
164
+ value: 0.1908690981304929
165
+ name: Pearson Euclidean
166
+ - type: spearman_euclidean
167
+ value: 0.22363767136304896
168
+ name: Spearman Euclidean
169
+ - type: pearson_dot
170
+ value: 0.15588248423807516
171
+ name: Pearson Dot
172
+ - type: spearman_dot
173
+ value: 0.22337189362164545
174
+ name: Spearman Dot
175
+ - type: pearson_max
176
+ value: 0.2216924674035587
177
+ name: Pearson Max
178
+ - type: spearman_max
179
+ value: 0.24997065610359018
180
+ name: Spearman Max
181
+ - task:
182
+ type: binary-classification
183
+ name: Binary Classification
184
+ dataset:
185
+ name: allNLI dev
186
+ type: allNLI-dev
187
+ metrics:
188
+ - type: cosine_accuracy
189
+ value: 0.666015625
190
+ name: Cosine Accuracy
191
+ - type: cosine_accuracy_threshold
192
+ value: 0.9797871112823486
193
+ name: Cosine Accuracy Threshold
194
+ - type: cosine_f1
195
+ value: 0.504258943781942
196
+ name: Cosine F1
197
+ - type: cosine_f1_threshold
198
+ value: 0.8929213285446167
199
+ name: Cosine F1 Threshold
200
+ - type: cosine_precision
201
+ value: 0.357487922705314
202
+ name: Cosine Precision
203
+ - type: cosine_recall
204
+ value: 0.8554913294797688
205
+ name: Cosine Recall
206
+ - type: cosine_ap
207
+ value: 0.4008449937025217
208
+ name: Cosine Ap
209
+ - type: dot_accuracy
210
+ value: 0.666015625
211
+ name: Dot Accuracy
212
+ - type: dot_accuracy_threshold
213
+ value: 752.6634521484375
214
+ name: Dot Accuracy Threshold
215
+ - type: dot_f1
216
+ value: 0.504258943781942
217
+ name: Dot F1
218
+ - type: dot_f1_threshold
219
+ value: 685.9220581054688
220
+ name: Dot F1 Threshold
221
+ - type: dot_precision
222
+ value: 0.357487922705314
223
+ name: Dot Precision
224
+ - type: dot_recall
225
+ value: 0.8554913294797688
226
+ name: Dot Recall
227
+ - type: dot_ap
228
+ value: 0.40071344979441287
229
+ name: Dot Ap
230
+ - type: manhattan_accuracy
231
+ value: 0.66796875
232
+ name: Manhattan Accuracy
233
+ - type: manhattan_accuracy_threshold
234
+ value: 144.52613830566406
235
+ name: Manhattan Accuracy Threshold
236
+ - type: manhattan_f1
237
+ value: 0.5075987841945289
238
+ name: Manhattan F1
239
+ - type: manhattan_f1_threshold
240
+ value: 267.046875
241
+ name: Manhattan F1 Threshold
242
+ - type: manhattan_precision
243
+ value: 0.3443298969072165
244
+ name: Manhattan Precision
245
+ - type: manhattan_recall
246
+ value: 0.9653179190751445
247
+ name: Manhattan Recall
248
+ - type: manhattan_ap
249
+ value: 0.4008700157620745
250
+ name: Manhattan Ap
251
+ - type: euclidean_accuracy
252
+ value: 0.666015625
253
+ name: Euclidean Accuracy
254
+ - type: euclidean_accuracy_threshold
255
+ value: 5.572628974914551
256
+ name: Euclidean Accuracy Threshold
257
+ - type: euclidean_f1
258
+ value: 0.504258943781942
259
+ name: Euclidean F1
260
+ - type: euclidean_f1_threshold
261
+ value: 12.826179504394531
262
+ name: Euclidean F1 Threshold
263
+ - type: euclidean_precision
264
+ value: 0.357487922705314
265
+ name: Euclidean Precision
266
+ - type: euclidean_recall
267
+ value: 0.8554913294797688
268
+ name: Euclidean Recall
269
+ - type: euclidean_ap
270
+ value: 0.40083962142052487
271
+ name: Euclidean Ap
272
+ - type: max_accuracy
273
+ value: 0.66796875
274
+ name: Max Accuracy
275
+ - type: max_accuracy_threshold
276
+ value: 752.6634521484375
277
+ name: Max Accuracy Threshold
278
+ - type: max_f1
279
+ value: 0.5075987841945289
280
+ name: Max F1
281
+ - type: max_f1_threshold
282
+ value: 685.9220581054688
283
+ name: Max F1 Threshold
284
+ - type: max_precision
285
+ value: 0.357487922705314
286
+ name: Max Precision
287
+ - type: max_recall
288
+ value: 0.9653179190751445
289
+ name: Max Recall
290
+ - type: max_ap
291
+ value: 0.4008700157620745
292
+ name: Max Ap
293
+ - task:
294
+ type: binary-classification
295
+ name: Binary Classification
296
+ dataset:
297
+ name: Qnli dev
298
+ type: Qnli-dev
299
+ metrics:
300
+ - type: cosine_accuracy
301
+ value: 0.591796875
302
+ name: Cosine Accuracy
303
+ - type: cosine_accuracy_threshold
304
+ value: 0.9479926824569702
305
+ name: Cosine Accuracy Threshold
306
+ - type: cosine_f1
307
+ value: 0.6291834002677376
308
+ name: Cosine F1
309
+ - type: cosine_f1_threshold
310
+ value: 0.7761930823326111
311
+ name: Cosine F1 Threshold
312
+ - type: cosine_precision
313
+ value: 0.4598825831702544
314
+ name: Cosine Precision
315
+ - type: cosine_recall
316
+ value: 0.9957627118644068
317
+ name: Cosine Recall
318
+ - type: cosine_ap
319
+ value: 0.5658036772817674
320
+ name: Cosine Ap
321
+ - type: dot_accuracy
322
+ value: 0.59375
323
+ name: Dot Accuracy
324
+ - type: dot_accuracy_threshold
325
+ value: 724.091064453125
326
+ name: Dot Accuracy Threshold
327
+ - type: dot_f1
328
+ value: 0.6291834002677376
329
+ name: Dot F1
330
+ - type: dot_f1_threshold
331
+ value: 596.2498779296875
332
+ name: Dot F1 Threshold
333
+ - type: dot_precision
334
+ value: 0.4598825831702544
335
+ name: Dot Precision
336
+ - type: dot_recall
337
+ value: 0.9957627118644068
338
+ name: Dot Recall
339
+ - type: dot_ap
340
+ value: 0.5657459555147606
341
+ name: Dot Ap
342
+ - type: manhattan_accuracy
343
+ value: 0.6171875
344
+ name: Manhattan Accuracy
345
+ - type: manhattan_accuracy_threshold
346
+ value: 202.07958984375
347
+ name: Manhattan Accuracy Threshold
348
+ - type: manhattan_f1
349
+ value: 0.6291834002677376
350
+ name: Manhattan F1
351
+ - type: manhattan_f1_threshold
352
+ value: 307.9236145019531
353
+ name: Manhattan F1 Threshold
354
+ - type: manhattan_precision
355
+ value: 0.4598825831702544
356
+ name: Manhattan Precision
357
+ - type: manhattan_recall
358
+ value: 0.9957627118644068
359
+ name: Manhattan Recall
360
+ - type: manhattan_ap
361
+ value: 0.5891966424964378
362
+ name: Manhattan Ap
363
+ - type: euclidean_accuracy
364
+ value: 0.591796875
365
+ name: Euclidean Accuracy
366
+ - type: euclidean_accuracy_threshold
367
+ value: 8.938886642456055
368
+ name: Euclidean Accuracy Threshold
369
+ - type: euclidean_f1
370
+ value: 0.6291834002677376
371
+ name: Euclidean F1
372
+ - type: euclidean_f1_threshold
373
+ value: 18.542938232421875
374
+ name: Euclidean F1 Threshold
375
+ - type: euclidean_precision
376
+ value: 0.4598825831702544
377
+ name: Euclidean Precision
378
+ - type: euclidean_recall
379
+ value: 0.9957627118644068
380
+ name: Euclidean Recall
381
+ - type: euclidean_ap
382
+ value: 0.5658036772817674
383
+ name: Euclidean Ap
384
+ - type: max_accuracy
385
+ value: 0.6171875
386
+ name: Max Accuracy
387
+ - type: max_accuracy_threshold
388
+ value: 724.091064453125
389
+ name: Max Accuracy Threshold
390
+ - type: max_f1
391
+ value: 0.6291834002677376
392
+ name: Max F1
393
+ - type: max_f1_threshold
394
+ value: 596.2498779296875
395
+ name: Max F1 Threshold
396
+ - type: max_precision
397
+ value: 0.4598825831702544
398
+ name: Max Precision
399
+ - type: max_recall
400
+ value: 0.9957627118644068
401
+ name: Max Recall
402
+ - type: max_ap
403
+ value: 0.5891966424964378
404
+ name: Max Ap
405
+ ---
406
+
407
+ # SentenceTransformer based on microsoft/deberta-v3-small
408
+
409
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
410
+
411
+ ## Model Details
412
+
413
+ ### Model Description
414
+ - **Model Type:** Sentence Transformer
415
+ - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) <!-- at revision a36c739020e01763fe789b4b85e2df55d6180012 -->
416
+ - **Maximum Sequence Length:** 512 tokens
417
+ - **Output Dimensionality:** 768 tokens
418
+ - **Similarity Function:** Cosine Similarity
419
+ <!-- - **Training Dataset:** Unknown -->
420
+ <!-- - **Language:** Unknown -->
421
+ <!-- - **License:** Unknown -->
422
+
423
+ ### Model Sources
424
+
425
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
426
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
427
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
428
+
429
+ ### Full Model Architecture
430
+
431
+ ```
432
+ SentenceTransformer(
433
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
434
+ (1): AdvancedWeightedPooling(
435
+ (linear_cls_pj): Linear(in_features=768, out_features=768, bias=True)
436
+ (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
437
+ (linear_mean_pj): Linear(in_features=768, out_features=768, bias=True)
438
+ (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
439
+ (mha): MultiheadAttention(
440
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
441
+ )
442
+ (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
443
+ (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
444
+ (layernorm_pjCls): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
445
+ (layernorm_pjMean): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
446
+ (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
447
+ )
448
+ )
449
+ ```
450
+
451
+ ## Usage
452
+
453
+ ### Direct Usage (Sentence Transformers)
454
+
455
+ First install the Sentence Transformers library:
456
+
457
+ ```bash
458
+ pip install -U sentence-transformers
459
+ ```
460
+
461
+ Then you can load this model and run inference.
462
+ ```python
463
+ from sentence_transformers import SentenceTransformer
464
+
465
+ # Download from the 🤗 Hub
466
+ model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest-step1-checkpoints-tmp")
467
+ # Run inference
468
+ sentences = [
469
+ 'who was the first person who made the violin',
470
+ 'Violin The first makers of violins probably borrowed from various developments of the Byzantine lira. These included the rebec;[13] the Arabic rebab; the vielle (also known as the fidel or viuola); and the lira da braccio[11][14] The violin in its present form emerged in early 16th-century northern Italy. The earliest pictures of violins, albeit with three strings, are seen in northern Italy around 1530, at around the same time as the words "violino" and "vyollon" are seen in Italian and French documents. One of the earliest explicit descriptions of the instrument, including its tuning, is from the Epitome musical by Jambe de Fer, published in Lyon in 1556.[15] By this time, the violin had already begun to spread throughout Europe.',
471
+ "Alice in Chains Alice in Chains is an American rock band from Seattle, Washington, formed in 1987 by guitarist and vocalist Jerry Cantrell and drummer Sean Kinney,[1] who recruited bassist Mike Starr[1] and lead vocalist Layne Staley.[1][2][3] Starr was replaced by Mike Inez in 1993.[4] After Staley's death in 2002, William DuVall joined in 2006 as co-lead vocalist and rhythm guitarist. The band took its name from Staley's previous group, the glam metal band Alice N' Chains.[5][2]",
472
+ ]
473
+ embeddings = model.encode(sentences)
474
+ print(embeddings.shape)
475
+ # [3, 768]
476
+
477
+ # Get the similarity scores for the embeddings
478
+ similarities = model.similarity(embeddings, embeddings)
479
+ print(similarities.shape)
480
+ # [3, 3]
481
+ ```
482
+
483
+ <!--
484
+ ### Direct Usage (Transformers)
485
+
486
+ <details><summary>Click to see the direct usage in Transformers</summary>
487
+
488
+ </details>
489
+ -->
490
+
491
+ <!--
492
+ ### Downstream Usage (Sentence Transformers)
493
+
494
+ You can finetune this model on your own dataset.
495
+
496
+ <details><summary>Click to expand</summary>
497
+
498
+ </details>
499
+ -->
500
+
501
+ <!--
502
+ ### Out-of-Scope Use
503
+
504
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
505
+ -->
506
+
507
+ ## Evaluation
508
+
509
+ ### Metrics
510
+
511
+ #### Semantic Similarity
512
+ * Dataset: `sts-test`
513
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
514
+
515
+ | Metric | Value |
516
+ |:--------------------|:-----------|
517
+ | pearson_cosine | 0.1562 |
518
+ | **spearman_cosine** | **0.2236** |
519
+ | pearson_manhattan | 0.2217 |
520
+ | spearman_manhattan | 0.25 |
521
+ | pearson_euclidean | 0.1909 |
522
+ | spearman_euclidean | 0.2236 |
523
+ | pearson_dot | 0.1559 |
524
+ | spearman_dot | 0.2234 |
525
+ | pearson_max | 0.2217 |
526
+ | spearman_max | 0.25 |
527
+
528
+ #### Binary Classification
529
+ * Dataset: `allNLI-dev`
530
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
531
+
532
+ | Metric | Value |
533
+ |:-----------------------------|:-----------|
534
+ | cosine_accuracy | 0.666 |
535
+ | cosine_accuracy_threshold | 0.9798 |
536
+ | cosine_f1 | 0.5043 |
537
+ | cosine_f1_threshold | 0.8929 |
538
+ | cosine_precision | 0.3575 |
539
+ | cosine_recall | 0.8555 |
540
+ | cosine_ap | 0.4008 |
541
+ | dot_accuracy | 0.666 |
542
+ | dot_accuracy_threshold | 752.6635 |
543
+ | dot_f1 | 0.5043 |
544
+ | dot_f1_threshold | 685.9221 |
545
+ | dot_precision | 0.3575 |
546
+ | dot_recall | 0.8555 |
547
+ | dot_ap | 0.4007 |
548
+ | manhattan_accuracy | 0.668 |
549
+ | manhattan_accuracy_threshold | 144.5261 |
550
+ | manhattan_f1 | 0.5076 |
551
+ | manhattan_f1_threshold | 267.0469 |
552
+ | manhattan_precision | 0.3443 |
553
+ | manhattan_recall | 0.9653 |
554
+ | manhattan_ap | 0.4009 |
555
+ | euclidean_accuracy | 0.666 |
556
+ | euclidean_accuracy_threshold | 5.5726 |
557
+ | euclidean_f1 | 0.5043 |
558
+ | euclidean_f1_threshold | 12.8262 |
559
+ | euclidean_precision | 0.3575 |
560
+ | euclidean_recall | 0.8555 |
561
+ | euclidean_ap | 0.4008 |
562
+ | max_accuracy | 0.668 |
563
+ | max_accuracy_threshold | 752.6635 |
564
+ | max_f1 | 0.5076 |
565
+ | max_f1_threshold | 685.9221 |
566
+ | max_precision | 0.3575 |
567
+ | max_recall | 0.9653 |
568
+ | **max_ap** | **0.4009** |
569
+
570
+ #### Binary Classification
571
+ * Dataset: `Qnli-dev`
572
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
573
+
574
+ | Metric | Value |
575
+ |:-----------------------------|:-----------|
576
+ | cosine_accuracy | 0.5918 |
577
+ | cosine_accuracy_threshold | 0.948 |
578
+ | cosine_f1 | 0.6292 |
579
+ | cosine_f1_threshold | 0.7762 |
580
+ | cosine_precision | 0.4599 |
581
+ | cosine_recall | 0.9958 |
582
+ | cosine_ap | 0.5658 |
583
+ | dot_accuracy | 0.5938 |
584
+ | dot_accuracy_threshold | 724.0911 |
585
+ | dot_f1 | 0.6292 |
586
+ | dot_f1_threshold | 596.2499 |
587
+ | dot_precision | 0.4599 |
588
+ | dot_recall | 0.9958 |
589
+ | dot_ap | 0.5657 |
590
+ | manhattan_accuracy | 0.6172 |
591
+ | manhattan_accuracy_threshold | 202.0796 |
592
+ | manhattan_f1 | 0.6292 |
593
+ | manhattan_f1_threshold | 307.9236 |
594
+ | manhattan_precision | 0.4599 |
595
+ | manhattan_recall | 0.9958 |
596
+ | manhattan_ap | 0.5892 |
597
+ | euclidean_accuracy | 0.5918 |
598
+ | euclidean_accuracy_threshold | 8.9389 |
599
+ | euclidean_f1 | 0.6292 |
600
+ | euclidean_f1_threshold | 18.5429 |
601
+ | euclidean_precision | 0.4599 |
602
+ | euclidean_recall | 0.9958 |
603
+ | euclidean_ap | 0.5658 |
604
+ | max_accuracy | 0.6172 |
605
+ | max_accuracy_threshold | 724.0911 |
606
+ | max_f1 | 0.6292 |
607
+ | max_f1_threshold | 596.2499 |
608
+ | max_precision | 0.4599 |
609
+ | max_recall | 0.9958 |
610
+ | **max_ap** | **0.5892** |
611
+
612
+ <!--
613
+ ## Bias, Risks and Limitations
614
+
615
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
616
+ -->
617
+
618
+ <!--
619
+ ### Recommendations
620
+
621
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
622
+ -->
623
+
624
+ ## Training Details
625
+
626
+ ### Training Dataset
627
+
628
+ #### Unnamed Dataset
629
+
630
+
631
+ * Size: 32,500 training samples
632
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
633
+ * Approximate statistics based on the first 1000 samples:
634
+ | | sentence1 | sentence2 |
635
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
636
+ | type | string | string |
637
+ | details | <ul><li>min: 4 tokens</li><li>mean: 29.3 tokens</li><li>max: 343 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 57.53 tokens</li><li>max: 512 tokens</li></ul> |
638
+ * Samples:
639
+ | sentence1 | sentence2 |
640
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
641
+ | <code>A Slippery Dick is what type of creature?</code> | <code>The Slippery Dick (Juvenile) - Whats That Fish! Description Also known as Sand-reef Wrasses and Slippery Dick Wrasse. Found singly or in pairs or in groups constantly circling around reefs, sea grass beds and sandy areas. Colours highly variable especially between juvenile to adult. They feed on hard shell invertebrates. Length - 18cm Depth - 2-12m Widespread Western Atlantic & Caribbean Most reef fish seen by divers during the day are grazers, that cruise around just above the surface of the coral or snoop into crevices looking for algae, worms and small crustaceans. Wrasses have small protruding teeth and graze the bottom taking in a variety of snails, worms, crabs, shrimps and eggs. Any hard coats or thick shells are then ground down by their pharyngeal jaws and the delicacies inside digested. From juvenile to adult wrasses dramatically alter their colour and body shapes. Wrasses are always on the go during the day, but are the first to go to bed and the last to rise. Small wrasses dive below the sand to sleep and larger wrasses wedge themselves in crevasses. Related creatures Heads up! Many creatures change during their life. Juvenile fish become adults and some change shape or their colour. Some species change sex and others just get older. The following creature(s) are known relatives of the Slippery Dick (Juvenile). Click the image(s) to explore further or hover over to get a better view! Slippery Dick</code> |
642
+ | <code>e.&#9;in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently.</code> | <code>Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas</code> |
643
+ | <code>In December 2015 , the film was ranked # 192 on IMDb .</code> | <code>As of December 2015 , it is the # 192 highest rated film on IMDb.</code> |
644
+ * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
645
+ ```json
646
+ {'guide': SentenceTransformer(
647
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
648
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
649
+ (2): Normalize()
650
+ ), 'temperature': 0.025}
651
+ ```
652
+
653
+ ### Evaluation Dataset
654
+
655
+ #### Unnamed Dataset
656
+
657
+
658
+ * Size: 1,664 evaluation samples
659
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
660
+ * Approximate statistics based on the first 1000 samples:
661
+ | | sentence1 | sentence2 |
662
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
663
+ | type | string | string |
664
+ | details | <ul><li>min: 4 tokens</li><li>mean: 28.74 tokens</li><li>max: 330 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 56.55 tokens</li><li>max: 512 tokens</li></ul> |
665
+ * Samples:
666
+ | sentence1 | sentence2 |
667
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
668
+ | <code>What component of an organism, made up of many cells, in turn makes up an organ?</code> | <code></code> |
669
+ | <code>Diffusion Diffusion is a process where atoms or molecules move from areas of high concentration to areas of low concentration.</code> | <code>Diffusion is the process in which a substance naturally moves from an area of higher to lower concentration.</code> |
670
+ | <code>In the 1966 movie The Good, The Bad And The Ugly, Clint Eastwood played the Good" and Lee van Cleef played "the Bad", but who played "the Ugly"?</code> | <code>View All Photos (10) Movie Info In the last and the best installment of his so-called "Dollars" trilogy of Sergio Leone-directed "spaghetti westerns," Clint Eastwood reprised the role of a taciturn, enigmatic loner. Here he searches for a cache of stolen gold against rivals the Bad (Lee Van Cleef), a ruthless bounty hunter, and the Ugly (Eli Wallach), a Mexican bandit. Though dubbed "the Good," Eastwood's character is not much better than his opponents -- he is just smarter and shoots faster. The film's title reveals its ironic attitude toward the canonized heroes of the classical western. "The real West was the world of violence, fear, and brutal instincts," claimed Leone. "In pursuit of profit there is no such thing as good and evil, generosity or deviousness; everything depends on chance, and not the best wins but the luckiest." Immensely entertaining and beautifully shot in Techniscope by Tonino Delli Colli, the movie is a virtually definitive "spaghetti western," rivaled only by Leone's own Once Upon a Time in the West (1968). The main musical theme by Ennio Morricone hit #1 on the British pop charts. Originally released in Italy at 177 minutes, the movie was later cut for its international release. ~ Yuri German, Rovi Rating:</code> |
671
+ * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
672
+ ```json
673
+ {'guide': SentenceTransformer(
674
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
675
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
676
+ (2): Normalize()
677
+ ), 'temperature': 0.025}
678
+ ```
679
+
680
+ ### Training Hyperparameters
681
+ #### Non-Default Hyperparameters
682
+
683
+ - `eval_strategy`: steps
684
+ - `per_device_train_batch_size`: 32
685
+ - `per_device_eval_batch_size`: 256
686
+ - `lr_scheduler_type`: cosine_with_min_lr
687
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
688
+ - `warmup_ratio`: 0.33
689
+ - `save_safetensors`: False
690
+ - `fp16`: True
691
+ - `push_to_hub`: True
692
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-toytest-step1-checkpoints-tmp
693
+ - `hub_strategy`: all_checkpoints
694
+ - `batch_sampler`: no_duplicates
695
+
696
+ #### All Hyperparameters
697
+ <details><summary>Click to expand</summary>
698
+
699
+ - `overwrite_output_dir`: False
700
+ - `do_predict`: False
701
+ - `eval_strategy`: steps
702
+ - `prediction_loss_only`: True
703
+ - `per_device_train_batch_size`: 32
704
+ - `per_device_eval_batch_size`: 256
705
+ - `per_gpu_train_batch_size`: None
706
+ - `per_gpu_eval_batch_size`: None
707
+ - `gradient_accumulation_steps`: 1
708
+ - `eval_accumulation_steps`: None
709
+ - `torch_empty_cache_steps`: None
710
+ - `learning_rate`: 5e-05
711
+ - `weight_decay`: 0.0
712
+ - `adam_beta1`: 0.9
713
+ - `adam_beta2`: 0.999
714
+ - `adam_epsilon`: 1e-08
715
+ - `max_grad_norm`: 1.0
716
+ - `num_train_epochs`: 3
717
+ - `max_steps`: -1
718
+ - `lr_scheduler_type`: cosine_with_min_lr
719
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
720
+ - `warmup_ratio`: 0.33
721
+ - `warmup_steps`: 0
722
+ - `log_level`: passive
723
+ - `log_level_replica`: warning
724
+ - `log_on_each_node`: True
725
+ - `logging_nan_inf_filter`: True
726
+ - `save_safetensors`: False
727
+ - `save_on_each_node`: False
728
+ - `save_only_model`: False
729
+ - `restore_callback_states_from_checkpoint`: False
730
+ - `no_cuda`: False
731
+ - `use_cpu`: False
732
+ - `use_mps_device`: False
733
+ - `seed`: 42
734
+ - `data_seed`: None
735
+ - `jit_mode_eval`: False
736
+ - `use_ipex`: False
737
+ - `bf16`: False
738
+ - `fp16`: True
739
+ - `fp16_opt_level`: O1
740
+ - `half_precision_backend`: auto
741
+ - `bf16_full_eval`: False
742
+ - `fp16_full_eval`: False
743
+ - `tf32`: None
744
+ - `local_rank`: 0
745
+ - `ddp_backend`: None
746
+ - `tpu_num_cores`: None
747
+ - `tpu_metrics_debug`: False
748
+ - `debug`: []
749
+ - `dataloader_drop_last`: False
750
+ - `dataloader_num_workers`: 0
751
+ - `dataloader_prefetch_factor`: None
752
+ - `past_index`: -1
753
+ - `disable_tqdm`: False
754
+ - `remove_unused_columns`: True
755
+ - `label_names`: None
756
+ - `load_best_model_at_end`: False
757
+ - `ignore_data_skip`: False
758
+ - `fsdp`: []
759
+ - `fsdp_min_num_params`: 0
760
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
761
+ - `fsdp_transformer_layer_cls_to_wrap`: None
762
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
763
+ - `deepspeed`: None
764
+ - `label_smoothing_factor`: 0.0
765
+ - `optim`: adamw_torch
766
+ - `optim_args`: None
767
+ - `adafactor`: False
768
+ - `group_by_length`: False
769
+ - `length_column_name`: length
770
+ - `ddp_find_unused_parameters`: None
771
+ - `ddp_bucket_cap_mb`: None
772
+ - `ddp_broadcast_buffers`: False
773
+ - `dataloader_pin_memory`: True
774
+ - `dataloader_persistent_workers`: False
775
+ - `skip_memory_metrics`: True
776
+ - `use_legacy_prediction_loop`: False
777
+ - `push_to_hub`: True
778
+ - `resume_from_checkpoint`: None
779
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-toytest-step1-checkpoints-tmp
780
+ - `hub_strategy`: all_checkpoints
781
+ - `hub_private_repo`: False
782
+ - `hub_always_push`: False
783
+ - `gradient_checkpointing`: False
784
+ - `gradient_checkpointing_kwargs`: None
785
+ - `include_inputs_for_metrics`: False
786
+ - `eval_do_concat_batches`: True
787
+ - `fp16_backend`: auto
788
+ - `push_to_hub_model_id`: None
789
+ - `push_to_hub_organization`: None
790
+ - `mp_parameters`:
791
+ - `auto_find_batch_size`: False
792
+ - `full_determinism`: False
793
+ - `torchdynamo`: None
794
+ - `ray_scope`: last
795
+ - `ddp_timeout`: 1800
796
+ - `torch_compile`: False
797
+ - `torch_compile_backend`: None
798
+ - `torch_compile_mode`: None
799
+ - `dispatch_batches`: None
800
+ - `split_batches`: None
801
+ - `include_tokens_per_second`: False
802
+ - `include_num_input_tokens_seen`: False
803
+ - `neftune_noise_alpha`: None
804
+ - `optim_target_modules`: None
805
+ - `batch_eval_metrics`: False
806
+ - `eval_on_start`: False
807
+ - `eval_use_gather_object`: False
808
+ - `batch_sampler`: no_duplicates
809
+ - `multi_dataset_batch_sampler`: proportional
810
+
811
+ </details>
812
+
813
+ ### Training Logs
814
+ <details><summary>Click to expand</summary>
815
+
816
+ | Epoch | Step | Training Loss | Validation Loss | sts-test_spearman_cosine | allNLI-dev_max_ap | Qnli-dev_max_ap |
817
+ |:------:|:----:|:-------------:|:---------------:|:------------------------:|:-----------------:|:---------------:|
818
+ | 0.0010 | 1 | 4.9603 | - | - | - | - |
819
+ | 0.0020 | 2 | 28.2529 | - | - | - | - |
820
+ | 0.0030 | 3 | 27.6365 | - | - | - | - |
821
+ | 0.0039 | 4 | 6.1387 | - | - | - | - |
822
+ | 0.0049 | 5 | 5.5753 | - | - | - | - |
823
+ | 0.0059 | 6 | 5.6951 | - | - | - | - |
824
+ | 0.0069 | 7 | 6.3533 | - | - | - | - |
825
+ | 0.0079 | 8 | 27.3848 | - | - | - | - |
826
+ | 0.0089 | 9 | 3.8501 | - | - | - | - |
827
+ | 0.0098 | 10 | 27.911 | - | - | - | - |
828
+ | 0.0108 | 11 | 4.9042 | - | - | - | - |
829
+ | 0.0118 | 12 | 6.8003 | - | - | - | - |
830
+ | 0.0128 | 13 | 5.7317 | - | - | - | - |
831
+ | 0.0138 | 14 | 20.261 | - | - | - | - |
832
+ | 0.0148 | 15 | 27.9051 | - | - | - | - |
833
+ | 0.0157 | 16 | 5.5959 | - | - | - | - |
834
+ | 0.0167 | 17 | 5.8052 | - | - | - | - |
835
+ | 0.0177 | 18 | 4.5088 | - | - | - | - |
836
+ | 0.0187 | 19 | 7.3472 | - | - | - | - |
837
+ | 0.0197 | 20 | 5.8668 | - | - | - | - |
838
+ | 0.0207 | 21 | 6.4083 | - | - | - | - |
839
+ | 0.0217 | 22 | 6.011 | - | - | - | - |
840
+ | 0.0226 | 23 | 5.2394 | - | - | - | - |
841
+ | 0.0236 | 24 | 4.2966 | - | - | - | - |
842
+ | 0.0246 | 25 | 26.605 | - | - | - | - |
843
+ | 0.0256 | 26 | 6.2067 | - | - | - | - |
844
+ | 0.0266 | 27 | 6.0346 | - | - | - | - |
845
+ | 0.0276 | 28 | 5.4676 | - | - | - | - |
846
+ | 0.0285 | 29 | 6.4292 | - | - | - | - |
847
+ | 0.0295 | 30 | 26.6452 | - | - | - | - |
848
+ | 0.0305 | 31 | 18.8401 | - | - | - | - |
849
+ | 0.0315 | 32 | 7.4531 | - | - | - | - |
850
+ | 0.0325 | 33 | 4.8286 | - | - | - | - |
851
+ | 0.0335 | 34 | 5.0078 | - | - | - | - |
852
+ | 0.0344 | 35 | 5.4115 | - | - | - | - |
853
+ | 0.0354 | 36 | 5.4196 | - | - | - | - |
854
+ | 0.0364 | 37 | 4.5023 | - | - | - | - |
855
+ | 0.0374 | 38 | 5.376 | - | - | - | - |
856
+ | 0.0384 | 39 | 5.2303 | - | - | - | - |
857
+ | 0.0394 | 40 | 5.6694 | - | - | - | - |
858
+ | 0.0404 | 41 | 4.7825 | - | - | - | - |
859
+ | 0.0413 | 42 | 4.6507 | - | - | - | - |
860
+ | 0.0423 | 43 | 24.2072 | - | - | - | - |
861
+ | 0.0433 | 44 | 4.9285 | - | - | - | - |
862
+ | 0.0443 | 45 | 6.326 | - | - | - | - |
863
+ | 0.0453 | 46 | 4.5724 | - | - | - | - |
864
+ | 0.0463 | 47 | 4.754 | - | - | - | - |
865
+ | 0.0472 | 48 | 5.5443 | - | - | - | - |
866
+ | 0.0482 | 49 | 4.5764 | - | - | - | - |
867
+ | 0.0492 | 50 | 5.1434 | - | - | - | - |
868
+ | 0.0502 | 51 | 22.6991 | - | - | - | - |
869
+ | 0.0512 | 52 | 5.4277 | - | - | - | - |
870
+ | 0.0522 | 53 | 5.0178 | - | - | - | - |
871
+ | 0.0531 | 54 | 4.8779 | - | - | - | - |
872
+ | 0.0541 | 55 | 4.2884 | - | - | - | - |
873
+ | 0.0551 | 56 | 16.0994 | - | - | - | - |
874
+ | 0.0561 | 57 | 21.31 | - | - | - | - |
875
+ | 0.0571 | 58 | 4.9721 | - | - | - | - |
876
+ | 0.0581 | 59 | 5.143 | - | - | - | - |
877
+ | 0.0591 | 60 | 3.5933 | - | - | - | - |
878
+ | 0.0600 | 61 | 5.2559 | - | - | - | - |
879
+ | 0.0610 | 62 | 4.0757 | - | - | - | - |
880
+ | 0.0620 | 63 | 3.6612 | - | - | - | - |
881
+ | 0.0630 | 64 | 4.7505 | - | - | - | - |
882
+ | 0.0640 | 65 | 4.1979 | - | - | - | - |
883
+ | 0.0650 | 66 | 3.9982 | - | - | - | - |
884
+ | 0.0659 | 67 | 4.7065 | - | - | - | - |
885
+ | 0.0669 | 68 | 5.3413 | - | - | - | - |
886
+ | 0.0679 | 69 | 3.6964 | - | - | - | - |
887
+ | 0.0689 | 70 | 17.8774 | - | - | - | - |
888
+ | 0.0699 | 71 | 4.8154 | - | - | - | - |
889
+ | 0.0709 | 72 | 4.8356 | - | - | - | - |
890
+ | 0.0719 | 73 | 4.568 | - | - | - | - |
891
+ | 0.0728 | 74 | 4.0898 | - | - | - | - |
892
+ | 0.0738 | 75 | 3.4502 | - | - | - | - |
893
+ | 0.0748 | 76 | 3.7733 | - | - | - | - |
894
+ | 0.0758 | 77 | 4.5204 | - | - | - | - |
895
+ | 0.0768 | 78 | 4.2526 | - | - | - | - |
896
+ | 0.0778 | 79 | 4.4398 | - | - | - | - |
897
+ | 0.0787 | 80 | 4.0988 | - | - | - | - |
898
+ | 0.0797 | 81 | 3.9704 | - | - | - | - |
899
+ | 0.0807 | 82 | 4.3343 | - | - | - | - |
900
+ | 0.0817 | 83 | 4.2587 | - | - | - | - |
901
+ | 0.0827 | 84 | 15.0149 | - | - | - | - |
902
+ | 0.0837 | 85 | 14.6599 | - | - | - | - |
903
+ | 0.0846 | 86 | 4.0623 | - | - | - | - |
904
+ | 0.0856 | 87 | 3.7597 | - | - | - | - |
905
+ | 0.0866 | 88 | 4.3433 | - | - | - | - |
906
+ | 0.0876 | 89 | 4.0287 | - | - | - | - |
907
+ | 0.0886 | 90 | 4.6257 | - | - | - | - |
908
+ | 0.0896 | 91 | 13.4689 | - | - | - | - |
909
+ | 0.0906 | 92 | 4.6583 | - | - | - | - |
910
+ | 0.0915 | 93 | 4.2682 | - | - | - | - |
911
+ | 0.0925 | 94 | 4.468 | - | - | - | - |
912
+ | 0.0935 | 95 | 3.4333 | - | - | - | - |
913
+ | 0.0945 | 96 | 12.7654 | - | - | - | - |
914
+ | 0.0955 | 97 | 3.5577 | - | - | - | - |
915
+ | 0.0965 | 98 | 12.5875 | - | - | - | - |
916
+ | 0.0974 | 99 | 4.2206 | - | - | - | - |
917
+ | 0.0984 | 100 | 3.5981 | - | - | - | - |
918
+ | 0.0994 | 101 | 3.5575 | - | - | - | - |
919
+ | 0.1004 | 102 | 4.0271 | - | - | - | - |
920
+ | 0.1014 | 103 | 4.0803 | - | - | - | - |
921
+ | 0.1024 | 104 | 4.0886 | - | - | - | - |
922
+ | 0.1033 | 105 | 4.176 | - | - | - | - |
923
+ | 0.1043 | 106 | 4.6653 | - | - | - | - |
924
+ | 0.1053 | 107 | 4.3076 | - | - | - | - |
925
+ | 0.1063 | 108 | 8.7282 | - | - | - | - |
926
+ | 0.1073 | 109 | 3.4192 | - | - | - | - |
927
+ | 0.1083 | 110 | 10.6027 | - | - | - | - |
928
+ | 0.1093 | 111 | 4.0959 | - | - | - | - |
929
+ | 0.1102 | 112 | 4.2785 | - | - | - | - |
930
+ | 0.1112 | 113 | 3.9945 | - | - | - | - |
931
+ | 0.1122 | 114 | 10.0652 | - | - | - | - |
932
+ | 0.1132 | 115 | 3.8621 | - | - | - | - |
933
+ | 0.1142 | 116 | 4.3975 | - | - | - | - |
934
+ | 0.1152 | 117 | 9.7899 | - | - | - | - |
935
+ | 0.1161 | 118 | 4.3812 | - | - | - | - |
936
+ | 0.1171 | 119 | 3.8715 | - | - | - | - |
937
+ | 0.1181 | 120 | 3.8327 | - | - | - | - |
938
+ | 0.1191 | 121 | 3.5103 | - | - | - | - |
939
+ | 0.1201 | 122 | 9.3158 | - | - | - | - |
940
+ | 0.1211 | 123 | 3.7201 | - | - | - | - |
941
+ | 0.1220 | 124 | 3.4311 | - | - | - | - |
942
+ | 0.1230 | 125 | 3.7946 | - | - | - | - |
943
+ | 0.1240 | 126 | 4.0456 | - | - | - | - |
944
+ | 0.125 | 127 | 3.482 | - | - | - | - |
945
+ | 0.1260 | 128 | 3.1901 | - | - | - | - |
946
+ | 0.1270 | 129 | 3.414 | - | - | - | - |
947
+ | 0.1280 | 130 | 3.4967 | - | - | - | - |
948
+ | 0.1289 | 131 | 3.6594 | - | - | - | - |
949
+ | 0.1299 | 132 | 8.066 | - | - | - | - |
950
+ | 0.1309 | 133 | 3.7872 | - | - | - | - |
951
+ | 0.1319 | 134 | 4.0023 | - | - | - | - |
952
+ | 0.1329 | 135 | 3.7728 | - | - | - | - |
953
+ | 0.1339 | 136 | 3.1893 | - | - | - | - |
954
+ | 0.1348 | 137 | 3.3635 | - | - | - | - |
955
+ | 0.1358 | 138 | 4.0195 | - | - | - | - |
956
+ | 0.1368 | 139 | 4.1097 | - | - | - | - |
957
+ | 0.1378 | 140 | 3.7903 | - | - | - | - |
958
+ | 0.1388 | 141 | 3.5748 | - | - | - | - |
959
+ | 0.1398 | 142 | 3.8104 | - | - | - | - |
960
+ | 0.1407 | 143 | 8.0411 | - | - | - | - |
961
+ | 0.1417 | 144 | 3.4819 | - | - | - | - |
962
+ | 0.1427 | 145 | 3.452 | - | - | - | - |
963
+ | 0.1437 | 146 | 3.5861 | - | - | - | - |
964
+ | 0.1447 | 147 | 3.4324 | - | - | - | - |
965
+ | 0.1457 | 148 | 3.521 | - | - | - | - |
966
+ | 0.1467 | 149 | 3.8868 | - | - | - | - |
967
+ | 0.1476 | 150 | 8.1191 | - | - | - | - |
968
+ | 0.1486 | 151 | 3.6447 | - | - | - | - |
969
+ | 0.1496 | 152 | 2.9436 | - | - | - | - |
970
+ | 0.1506 | 153 | 8.1535 | 2.2032 | 0.2236 | 0.4009 | 0.5892 |
971
+ | 0.1516 | 154 | 3.9619 | - | - | - | - |
972
+ | 0.1526 | 155 | 3.1301 | - | - | - | - |
973
+ | 0.1535 | 156 | 3.0478 | - | - | - | - |
974
+ | 0.1545 | 157 | 3.2986 | - | - | - | - |
975
+ | 0.1555 | 158 | 3.2847 | - | - | - | - |
976
+ | 0.1565 | 159 | 3.6599 | - | - | - | - |
977
+ | 0.1575 | 160 | 3.2238 | - | - | - | - |
978
+ | 0.1585 | 161 | 2.8897 | - | - | - | - |
979
+ | 0.1594 | 162 | 3.9443 | - | - | - | - |
980
+ | 0.1604 | 163 | 3.3733 | - | - | - | - |
981
+ | 0.1614 | 164 | 3.7444 | - | - | - | - |
982
+ | 0.1624 | 165 | 3.4813 | - | - | - | - |
983
+ | 0.1634 | 166 | 2.6865 | - | - | - | - |
984
+ | 0.1644 | 167 | 2.7587 | - | - | - | - |
985
+ | 0.1654 | 168 | 3.3628 | - | - | - | - |
986
+ | 0.1663 | 169 | 3.0035 | - | - | - | - |
987
+ | 0.1673 | 170 | 10.1591 | - | - | - | - |
988
+ | 0.1683 | 171 | 3.5366 | - | - | - | - |
989
+ | 0.1693 | 172 | 8.4047 | - | - | - | - |
990
+ | 0.1703 | 173 | 3.8643 | - | - | - | - |
991
+ | 0.1713 | 174 | 3.3529 | - | - | - | - |
992
+ | 0.1722 | 175 | 3.7143 | - | - | - | - |
993
+ | 0.1732 | 176 | 3.3323 | - | - | - | - |
994
+ | 0.1742 | 177 | 3.1206 | - | - | - | - |
995
+ | 0.1752 | 178 | 3.1348 | - | - | - | - |
996
+ | 0.1762 | 179 | 7.6011 | - | - | - | - |
997
+ | 0.1772 | 180 | 3.7025 | - | - | - | - |
998
+ | 0.1781 | 181 | 10.5662 | - | - | - | - |
999
+ | 0.1791 | 182 | 8.966 | - | - | - | - |
1000
+ | 0.1801 | 183 | 9.426 | - | - | - | - |
1001
+ | 0.1811 | 184 | 3.0025 | - | - | - | - |
1002
+ | 0.1821 | 185 | 7.0984 | - | - | - | - |
1003
+ | 0.1831 | 186 | 7.3808 | - | - | - | - |
1004
+ | 0.1841 | 187 | 2.8657 | - | - | - | - |
1005
+ | 0.1850 | 188 | 6.5636 | - | - | - | - |
1006
+ | 0.1860 | 189 | 3.4702 | - | - | - | - |
1007
+ | 0.1870 | 190 | 5.9302 | - | - | - | - |
1008
+ | 0.1880 | 191 | 3.2406 | - | - | - | - |
1009
+ | 0.1890 | 192 | 3.4459 | - | - | - | - |
1010
+ | 0.1900 | 193 | 5.269 | - | - | - | - |
1011
+ | 0.1909 | 194 | 4.8605 | - | - | - | - |
1012
+ | 0.1919 | 195 | 2.9891 | - | - | - | - |
1013
+ | 0.1929 | 196 | 3.6681 | - | - | - | - |
1014
+ | 0.1939 | 197 | 3.1589 | - | - | - | - |
1015
+ | 0.1949 | 198 | 3.1835 | - | - | - | - |
1016
+ | 0.1959 | 199 | 3.7561 | - | - | - | - |
1017
+ | 0.1969 | 200 | 4.0891 | - | - | - | - |
1018
+ | 0.1978 | 201 | 3.563 | - | - | - | - |
1019
+ | 0.1988 | 202 | 3.7433 | - | - | - | - |
1020
+ | 0.1998 | 203 | 3.3813 | - | - | - | - |
1021
+ | 0.2008 | 204 | 5.2311 | - | - | - | - |
1022
+ | 0.2018 | 205 | 3.3494 | - | - | - | - |
1023
+ | 0.2028 | 206 | 3.3533 | - | - | - | - |
1024
+ | 0.2037 | 207 | 3.688 | - | - | - | - |
1025
+ | 0.2047 | 208 | 3.5342 | - | - | - | - |
1026
+ | 0.2057 | 209 | 4.9381 | - | - | - | - |
1027
+ | 0.2067 | 210 | 3.1839 | - | - | - | - |
1028
+ | 0.2077 | 211 | 3.0465 | - | - | - | - |
1029
+ | 0.2087 | 212 | 3.1232 | - | - | - | - |
1030
+ | 0.2096 | 213 | 4.6297 | - | - | - | - |
1031
+ | 0.2106 | 214 | 2.9834 | - | - | - | - |
1032
+ | 0.2116 | 215 | 4.2231 | - | - | - | - |
1033
+ | 0.2126 | 216 | 3.1458 | - | - | - | - |
1034
+ | 0.2136 | 217 | 3.2525 | - | - | - | - |
1035
+ | 0.2146 | 218 | 3.5971 | - | - | - | - |
1036
+ | 0.2156 | 219 | 3.5616 | - | - | - | - |
1037
+ | 0.2165 | 220 | 3.2378 | - | - | - | - |
1038
+ | 0.2175 | 221 | 2.9075 | - | - | - | - |
1039
+ | 0.2185 | 222 | 3.0391 | - | - | - | - |
1040
+ | 0.2195 | 223 | 3.5573 | - | - | - | - |
1041
+ | 0.2205 | 224 | 3.2092 | - | - | - | - |
1042
+ | 0.2215 | 225 | 3.2646 | - | - | - | - |
1043
+ | 0.2224 | 226 | 3.0886 | - | - | - | - |
1044
+ | 0.2234 | 227 | 3.5241 | - | - | - | - |
1045
+ | 0.2244 | 228 | 3.0111 | - | - | - | - |
1046
+ | 0.2254 | 229 | 3.707 | - | - | - | - |
1047
+ | 0.2264 | 230 | 5.3822 | - | - | - | - |
1048
+ | 0.2274 | 231 | 3.2646 | - | - | - | - |
1049
+ | 0.2283 | 232 | 2.7021 | - | - | - | - |
1050
+ | 0.2293 | 233 | 3.5131 | - | - | - | - |
1051
+ | 0.2303 | 234 | 3.103 | - | - | - | - |
1052
+ | 0.2313 | 235 | 2.9535 | - | - | - | - |
1053
+ | 0.2323 | 236 | 2.9631 | - | - | - | - |
1054
+ | 0.2333 | 237 | 2.8068 | - | - | - | - |
1055
+ | 0.2343 | 238 | 3.4251 | - | - | - | - |
1056
+ | 0.2352 | 239 | 2.8495 | - | - | - | - |
1057
+ | 0.2362 | 240 | 2.9972 | - | - | - | - |
1058
+ | 0.2372 | 241 | 3.3509 | - | - | - | - |
1059
+ | 0.2382 | 242 | 2.9234 | - | - | - | - |
1060
+ | 0.2392 | 243 | 2.4086 | - | - | - | - |
1061
+ | 0.2402 | 244 | 3.1282 | - | - | - | - |
1062
+ | 0.2411 | 245 | 2.3352 | - | - | - | - |
1063
+ | 0.2421 | 246 | 2.4706 | - | - | - | - |
1064
+ | 0.2431 | 247 | 3.5449 | - | - | - | - |
1065
+ | 0.2441 | 248 | 2.8963 | - | - | - | - |
1066
+ | 0.2451 | 249 | 2.773 | - | - | - | - |
1067
+ | 0.2461 | 250 | 2.355 | - | - | - | - |
1068
+ | 0.2470 | 251 | 2.656 | - | - | - | - |
1069
+ | 0.2480 | 252 | 2.6221 | - | - | - | - |
1070
+ | 0.2490 | 253 | 8.6739 | - | - | - | - |
1071
+ | 0.25 | 254 | 10.8242 | - | - | - | - |
1072
+ | 0.2510 | 255 | 2.3408 | - | - | - | - |
1073
+ | 0.2520 | 256 | 2.1221 | - | - | - | - |
1074
+ | 0.2530 | 257 | 3.295 | - | - | - | - |
1075
+ | 0.2539 | 258 | 2.5896 | - | - | - | - |
1076
+ | 0.2549 | 259 | 2.1215 | - | - | - | - |
1077
+ | 0.2559 | 260 | 9.4851 | - | - | - | - |
1078
+ | 0.2569 | 261 | 2.1982 | - | - | - | - |
1079
+ | 0.2579 | 262 | 3.0568 | - | - | - | - |
1080
+ | 0.2589 | 263 | 2.6269 | - | - | - | - |
1081
+ | 0.2598 | 264 | 2.4792 | - | - | - | - |
1082
+ | 0.2608 | 265 | 1.9445 | - | - | - | - |
1083
+ | 0.2618 | 266 | 2.4061 | - | - | - | - |
1084
+ | 0.2628 | 267 | 8.3116 | - | - | - | - |
1085
+ | 0.2638 | 268 | 8.0804 | - | - | - | - |
1086
+ | 0.2648 | 269 | 2.1674 | - | - | - | - |
1087
+ | 0.2657 | 270 | 7.1975 | - | - | - | - |
1088
+ | 0.2667 | 271 | 5.9104 | - | - | - | - |
1089
+ | 0.2677 | 272 | 2.498 | - | - | - | - |
1090
+ | 0.2687 | 273 | 2.5249 | - | - | - | - |
1091
+ | 0.2697 | 274 | 2.7152 | - | - | - | - |
1092
+ | 0.2707 | 275 | 2.7904 | - | - | - | - |
1093
+ | 0.2717 | 276 | 2.7745 | - | - | - | - |
1094
+ | 0.2726 | 277 | 2.9741 | - | - | - | - |
1095
+ | 0.2736 | 278 | 1.8215 | - | - | - | - |
1096
+ | 0.2746 | 279 | 4.6844 | - | - | - | - |
1097
+ | 0.2756 | 280 | 2.8613 | - | - | - | - |
1098
+ | 0.2766 | 281 | 2.7147 | - | - | - | - |
1099
+ | 0.2776 | 282 | 2.814 | - | - | - | - |
1100
+ | 0.2785 | 283 | 2.3569 | - | - | - | - |
1101
+ | 0.2795 | 284 | 2.672 | - | - | - | - |
1102
+ | 0.2805 | 285 | 3.2052 | - | - | - | - |
1103
+ | 0.2815 | 286 | 2.8056 | - | - | - | - |
1104
+ | 0.2825 | 287 | 2.6268 | - | - | - | - |
1105
+ | 0.2835 | 288 | 2.5641 | - | - | - | - |
1106
+ | 0.2844 | 289 | 2.4475 | - | - | - | - |
1107
+ | 0.2854 | 290 | 2.7377 | - | - | - | - |
1108
+ | 0.2864 | 291 | 2.3831 | - | - | - | - |
1109
+ | 0.2874 | 292 | 8.8069 | - | - | - | - |
1110
+ | 0.2884 | 293 | 2.186 | - | - | - | - |
1111
+ | 0.2894 | 294 | 2.3389 | - | - | - | - |
1112
+ | 0.2904 | 295 | 1.9744 | - | - | - | - |
1113
+ | 0.2913 | 296 | 2.4491 | - | - | - | - |
1114
+ | 0.2923 | 297 | 2.5668 | - | - | - | - |
1115
+ | 0.2933 | 298 | 2.1939 | - | - | - | - |
1116
+ | 0.2943 | 299 | 2.2832 | - | - | - | - |
1117
+ | 0.2953 | 300 | 2.7508 | - | - | - | - |
1118
+ | 0.2963 | 301 | 2.5206 | - | - | - | - |
1119
+ | 0.2972 | 302 | 2.3522 | - | - | - | - |
1120
+ | 0.2982 | 303 | 2.7186 | - | - | - | - |
1121
+ | 0.2992 | 304 | 2.1369 | - | - | - | - |
1122
+ | 0.3002 | 305 | 9.7972 | - | - | - | - |
1123
+
1124
+ </details>
1125
+
1126
+ ### Framework Versions
1127
+ - Python: 3.10.12
1128
+ - Sentence Transformers: 3.2.1
1129
+ - Transformers: 4.44.2
1130
+ - PyTorch: 2.5.0+cu121
1131
+ - Accelerate: 0.34.2
1132
+ - Datasets: 3.0.2
1133
+ - Tokenizers: 0.19.1
1134
+
1135
+ ## Citation
1136
+
1137
+ ### BibTeX
1138
+
1139
+ #### Sentence Transformers
1140
+ ```bibtex
1141
+ @inproceedings{reimers-2019-sentence-bert,
1142
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1143
+ author = "Reimers, Nils and Gurevych, Iryna",
1144
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1145
+ month = "11",
1146
+ year = "2019",
1147
+ publisher = "Association for Computational Linguistics",
1148
+ url = "https://arxiv.org/abs/1908.10084",
1149
+ }
1150
+ ```
1151
+
1152
+ #### GISTEmbedLoss
1153
+ ```bibtex
1154
+ @misc{solatorio2024gistembed,
1155
+ title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
1156
+ author={Aivin V. Solatorio},
1157
+ year={2024},
1158
+ eprint={2402.16829},
1159
+ archivePrefix={arXiv},
1160
+ primaryClass={cs.LG}
1161
+ }
1162
+ ```
1163
+
1164
+ <!--
1165
+ ## Glossary
1166
+
1167
+ *Clearly define terms in order to be accessible across audiences.*
1168
+ -->
1169
+
1170
+ <!--
1171
+ ## Model Card Authors
1172
+
1173
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1174
+ -->
1175
+
1176
+ <!--
1177
+ ## Model Card Contact
1178
+
1179
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1180
+ -->
checkpoint-305/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-305/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-v3-small",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.44.2",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-305/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.5.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-305/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_AdvancedWeightedPooling",
12
+ "type": "__main__.AdvancedWeightedPooling"
13
+ }
14
+ ]
checkpoint-305/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65d529ba9ab24c9d8adb080544039e1a708aa68d1217c8ee4a7a1fba3aab6ef7
3
+ size 151299002
checkpoint-305/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a57c90603cd53748313e111bacadbca362be5ab2596e4cc509d8f0c45cd399ec
3
+ size 565251810
checkpoint-305/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:253b239ce62b42cce45af0a6b6211997a8c98938b54440533396c5907de9bf77
3
+ size 14180
checkpoint-305/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e92a39523463434c6815f5d795ae20b59b2f4d483e4b38fb95e050b91044805
3
+ size 1256
checkpoint-305/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-305/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
checkpoint-305/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-305/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-305/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "sp_model_kwargs": {},
54
+ "split_by_punct": false,
55
+ "tokenizer_class": "DebertaV2Tokenizer",
56
+ "unk_token": "[UNK]",
57
+ "vocab_type": "spm"
58
+ }
checkpoint-305/trainer_state.json ADDED
@@ -0,0 +1,2257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.3001968503937008,
5
+ "eval_steps": 153,
6
+ "global_step": 305,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.000984251968503937,
13
+ "grad_norm": NaN,
14
+ "learning_rate": 0.0,
15
+ "loss": 4.9603,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.001968503937007874,
20
+ "grad_norm": NaN,
21
+ "learning_rate": 0.0,
22
+ "loss": 28.2529,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.002952755905511811,
27
+ "grad_norm": Infinity,
28
+ "learning_rate": 0.0,
29
+ "loss": 27.6365,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.003937007874015748,
34
+ "grad_norm": 36.58856201171875,
35
+ "learning_rate": 9.940357852882705e-10,
36
+ "loss": 6.1387,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.004921259842519685,
41
+ "grad_norm": 34.671714782714844,
42
+ "learning_rate": 1.988071570576541e-09,
43
+ "loss": 5.5753,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.005905511811023622,
48
+ "grad_norm": 35.53398132324219,
49
+ "learning_rate": 2.9821073558648116e-09,
50
+ "loss": 5.6951,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.006889763779527559,
55
+ "grad_norm": Infinity,
56
+ "learning_rate": 2.9821073558648116e-09,
57
+ "loss": 6.3533,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.007874015748031496,
62
+ "grad_norm": Infinity,
63
+ "learning_rate": 2.9821073558648116e-09,
64
+ "loss": 27.3848,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.008858267716535433,
69
+ "grad_norm": 16.95815086364746,
70
+ "learning_rate": 3.976143141153082e-09,
71
+ "loss": 3.8501,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.00984251968503937,
76
+ "grad_norm": 69.63166046142578,
77
+ "learning_rate": 4.970178926441353e-09,
78
+ "loss": 27.911,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.010826771653543307,
83
+ "grad_norm": 29.516401290893555,
84
+ "learning_rate": 5.964214711729623e-09,
85
+ "loss": 4.9042,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.011811023622047244,
90
+ "grad_norm": 39.30358123779297,
91
+ "learning_rate": 6.9582504970178946e-09,
92
+ "loss": 6.8003,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.012795275590551181,
97
+ "grad_norm": 36.83983612060547,
98
+ "learning_rate": 7.952286282306164e-09,
99
+ "loss": 5.7317,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.013779527559055118,
104
+ "grad_norm": 55.4377326965332,
105
+ "learning_rate": 8.946322067594435e-09,
106
+ "loss": 20.261,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.014763779527559055,
111
+ "grad_norm": 68.62684631347656,
112
+ "learning_rate": 9.940357852882705e-09,
113
+ "loss": 27.9051,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.015748031496062992,
118
+ "grad_norm": 31.27193832397461,
119
+ "learning_rate": 1.0934393638170978e-08,
120
+ "loss": 5.5959,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.01673228346456693,
125
+ "grad_norm": 36.56179428100586,
126
+ "learning_rate": 1.1928429423459246e-08,
127
+ "loss": 5.8052,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.017716535433070866,
132
+ "grad_norm": 23.220964431762695,
133
+ "learning_rate": 1.2922465208747517e-08,
134
+ "loss": 4.5088,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.018700787401574805,
139
+ "grad_norm": 44.375823974609375,
140
+ "learning_rate": 1.3916500994035789e-08,
141
+ "loss": 7.3472,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.01968503937007874,
146
+ "grad_norm": 40.480628967285156,
147
+ "learning_rate": 1.4910536779324056e-08,
148
+ "loss": 5.8668,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.02066929133858268,
153
+ "grad_norm": 45.778358459472656,
154
+ "learning_rate": 1.590457256461233e-08,
155
+ "loss": 6.4083,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.021653543307086614,
160
+ "grad_norm": 41.16820526123047,
161
+ "learning_rate": 1.68986083499006e-08,
162
+ "loss": 6.011,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.022637795275590553,
167
+ "grad_norm": 27.49931526184082,
168
+ "learning_rate": 1.789264413518887e-08,
169
+ "loss": 5.2394,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.023622047244094488,
174
+ "grad_norm": 20.837919235229492,
175
+ "learning_rate": 1.888667992047714e-08,
176
+ "loss": 4.2966,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.024606299212598427,
181
+ "grad_norm": 65.7834243774414,
182
+ "learning_rate": 1.988071570576541e-08,
183
+ "loss": 26.605,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.025590551181102362,
188
+ "grad_norm": 44.794960021972656,
189
+ "learning_rate": 2.087475149105368e-08,
190
+ "loss": 6.2067,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.0265748031496063,
195
+ "grad_norm": 30.213058471679688,
196
+ "learning_rate": 2.1868787276341955e-08,
197
+ "loss": 6.0346,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.027559055118110236,
202
+ "grad_norm": 27.49605941772461,
203
+ "learning_rate": 2.2862823061630224e-08,
204
+ "loss": 5.4676,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.028543307086614175,
209
+ "grad_norm": 35.94675827026367,
210
+ "learning_rate": 2.3856858846918493e-08,
211
+ "loss": 6.4292,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 0.02952755905511811,
216
+ "grad_norm": 65.74870300292969,
217
+ "learning_rate": 2.4850894632206765e-08,
218
+ "loss": 26.6452,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 0.03051181102362205,
223
+ "grad_norm": 49.05170440673828,
224
+ "learning_rate": 2.5844930417495034e-08,
225
+ "loss": 18.8401,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 0.031496062992125984,
230
+ "grad_norm": 49.38396453857422,
231
+ "learning_rate": 2.6838966202783303e-08,
232
+ "loss": 7.4531,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 0.03248031496062992,
237
+ "grad_norm": 25.29888916015625,
238
+ "learning_rate": 2.7833001988071578e-08,
239
+ "loss": 4.8286,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 0.03346456692913386,
244
+ "grad_norm": 28.889131546020508,
245
+ "learning_rate": 2.8827037773359847e-08,
246
+ "loss": 5.0078,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 0.0344488188976378,
251
+ "grad_norm": 33.611812591552734,
252
+ "learning_rate": 2.982107355864811e-08,
253
+ "loss": 5.4115,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 0.03543307086614173,
258
+ "grad_norm": 31.503385543823242,
259
+ "learning_rate": 3.081510934393639e-08,
260
+ "loss": 5.4196,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 0.03641732283464567,
265
+ "grad_norm": 23.436307907104492,
266
+ "learning_rate": 3.180914512922466e-08,
267
+ "loss": 4.5023,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 0.03740157480314961,
272
+ "grad_norm": 35.096893310546875,
273
+ "learning_rate": 3.280318091451293e-08,
274
+ "loss": 5.376,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 0.038385826771653545,
279
+ "grad_norm": 25.531570434570312,
280
+ "learning_rate": 3.37972166998012e-08,
281
+ "loss": 5.2303,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 0.03937007874015748,
286
+ "grad_norm": 29.393512725830078,
287
+ "learning_rate": 3.479125248508947e-08,
288
+ "loss": 5.6694,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 0.040354330708661415,
293
+ "grad_norm": 26.839847564697266,
294
+ "learning_rate": 3.578528827037774e-08,
295
+ "loss": 4.7825,
296
+ "step": 41
297
+ },
298
+ {
299
+ "epoch": 0.04133858267716536,
300
+ "grad_norm": 21.11309814453125,
301
+ "learning_rate": 3.6779324055666005e-08,
302
+ "loss": 4.6507,
303
+ "step": 42
304
+ },
305
+ {
306
+ "epoch": 0.04232283464566929,
307
+ "grad_norm": 61.134098052978516,
308
+ "learning_rate": 3.777335984095428e-08,
309
+ "loss": 24.2072,
310
+ "step": 43
311
+ },
312
+ {
313
+ "epoch": 0.04330708661417323,
314
+ "grad_norm": 26.884740829467773,
315
+ "learning_rate": 3.8767395626242556e-08,
316
+ "loss": 4.9285,
317
+ "step": 44
318
+ },
319
+ {
320
+ "epoch": 0.04429133858267716,
321
+ "grad_norm": 33.500144958496094,
322
+ "learning_rate": 3.976143141153082e-08,
323
+ "loss": 6.326,
324
+ "step": 45
325
+ },
326
+ {
327
+ "epoch": 0.045275590551181105,
328
+ "grad_norm": 17.54262924194336,
329
+ "learning_rate": 4.0755467196819094e-08,
330
+ "loss": 4.5724,
331
+ "step": 46
332
+ },
333
+ {
334
+ "epoch": 0.04625984251968504,
335
+ "grad_norm": 23.30596351623535,
336
+ "learning_rate": 4.174950298210736e-08,
337
+ "loss": 4.754,
338
+ "step": 47
339
+ },
340
+ {
341
+ "epoch": 0.047244094488188976,
342
+ "grad_norm": 34.042816162109375,
343
+ "learning_rate": 4.274353876739563e-08,
344
+ "loss": 5.5443,
345
+ "step": 48
346
+ },
347
+ {
348
+ "epoch": 0.04822834645669291,
349
+ "grad_norm": 21.270071029663086,
350
+ "learning_rate": 4.373757455268391e-08,
351
+ "loss": 4.5764,
352
+ "step": 49
353
+ },
354
+ {
355
+ "epoch": 0.04921259842519685,
356
+ "grad_norm": 24.815349578857422,
357
+ "learning_rate": 4.4731610337972176e-08,
358
+ "loss": 5.1434,
359
+ "step": 50
360
+ },
361
+ {
362
+ "epoch": 0.05019685039370079,
363
+ "grad_norm": 55.756900787353516,
364
+ "learning_rate": 4.572564612326045e-08,
365
+ "loss": 22.6991,
366
+ "step": 51
367
+ },
368
+ {
369
+ "epoch": 0.051181102362204724,
370
+ "grad_norm": 23.544273376464844,
371
+ "learning_rate": 4.6719681908548713e-08,
372
+ "loss": 5.4277,
373
+ "step": 52
374
+ },
375
+ {
376
+ "epoch": 0.05216535433070866,
377
+ "grad_norm": 21.845703125,
378
+ "learning_rate": 4.7713717693836986e-08,
379
+ "loss": 5.0178,
380
+ "step": 53
381
+ },
382
+ {
383
+ "epoch": 0.0531496062992126,
384
+ "grad_norm": 16.331026077270508,
385
+ "learning_rate": 4.870775347912525e-08,
386
+ "loss": 4.8779,
387
+ "step": 54
388
+ },
389
+ {
390
+ "epoch": 0.054133858267716536,
391
+ "grad_norm": 16.72958755493164,
392
+ "learning_rate": 4.970178926441353e-08,
393
+ "loss": 4.2884,
394
+ "step": 55
395
+ },
396
+ {
397
+ "epoch": 0.05511811023622047,
398
+ "grad_norm": 41.22899627685547,
399
+ "learning_rate": 5.06958250497018e-08,
400
+ "loss": 16.0994,
401
+ "step": 56
402
+ },
403
+ {
404
+ "epoch": 0.05610236220472441,
405
+ "grad_norm": 52.578941345214844,
406
+ "learning_rate": 5.168986083499007e-08,
407
+ "loss": 21.31,
408
+ "step": 57
409
+ },
410
+ {
411
+ "epoch": 0.05708661417322835,
412
+ "grad_norm": 15.741512298583984,
413
+ "learning_rate": 5.268389662027834e-08,
414
+ "loss": 4.9721,
415
+ "step": 58
416
+ },
417
+ {
418
+ "epoch": 0.058070866141732284,
419
+ "grad_norm": 23.57728385925293,
420
+ "learning_rate": 5.3677932405566605e-08,
421
+ "loss": 5.143,
422
+ "step": 59
423
+ },
424
+ {
425
+ "epoch": 0.05905511811023622,
426
+ "grad_norm": 12.699495315551758,
427
+ "learning_rate": 5.467196819085488e-08,
428
+ "loss": 3.5933,
429
+ "step": 60
430
+ },
431
+ {
432
+ "epoch": 0.060039370078740155,
433
+ "grad_norm": 17.49776840209961,
434
+ "learning_rate": 5.5666003976143156e-08,
435
+ "loss": 5.2559,
436
+ "step": 61
437
+ },
438
+ {
439
+ "epoch": 0.0610236220472441,
440
+ "grad_norm": 13.251837730407715,
441
+ "learning_rate": 5.666003976143142e-08,
442
+ "loss": 4.0757,
443
+ "step": 62
444
+ },
445
+ {
446
+ "epoch": 0.06200787401574803,
447
+ "grad_norm": 11.610112190246582,
448
+ "learning_rate": 5.7654075546719694e-08,
449
+ "loss": 3.6612,
450
+ "step": 63
451
+ },
452
+ {
453
+ "epoch": 0.06299212598425197,
454
+ "grad_norm": 19.652385711669922,
455
+ "learning_rate": 5.864811133200796e-08,
456
+ "loss": 4.7505,
457
+ "step": 64
458
+ },
459
+ {
460
+ "epoch": 0.0639763779527559,
461
+ "grad_norm": 13.930652618408203,
462
+ "learning_rate": 5.964214711729623e-08,
463
+ "loss": 4.1979,
464
+ "step": 65
465
+ },
466
+ {
467
+ "epoch": 0.06496062992125984,
468
+ "grad_norm": 11.817291259765625,
469
+ "learning_rate": 6.06361829025845e-08,
470
+ "loss": 3.9982,
471
+ "step": 66
472
+ },
473
+ {
474
+ "epoch": 0.06594488188976377,
475
+ "grad_norm": 18.6019287109375,
476
+ "learning_rate": 6.163021868787278e-08,
477
+ "loss": 4.7065,
478
+ "step": 67
479
+ },
480
+ {
481
+ "epoch": 0.06692913385826772,
482
+ "grad_norm": 27.056259155273438,
483
+ "learning_rate": 6.262425447316104e-08,
484
+ "loss": 5.3413,
485
+ "step": 68
486
+ },
487
+ {
488
+ "epoch": 0.06791338582677166,
489
+ "grad_norm": 13.010573387145996,
490
+ "learning_rate": 6.361829025844931e-08,
491
+ "loss": 3.6964,
492
+ "step": 69
493
+ },
494
+ {
495
+ "epoch": 0.0688976377952756,
496
+ "grad_norm": 49.04872131347656,
497
+ "learning_rate": 6.461232604373759e-08,
498
+ "loss": 17.8774,
499
+ "step": 70
500
+ },
501
+ {
502
+ "epoch": 0.06988188976377953,
503
+ "grad_norm": 19.028602600097656,
504
+ "learning_rate": 6.560636182902586e-08,
505
+ "loss": 4.8154,
506
+ "step": 71
507
+ },
508
+ {
509
+ "epoch": 0.07086614173228346,
510
+ "grad_norm": 17.006460189819336,
511
+ "learning_rate": 6.660039761431412e-08,
512
+ "loss": 4.8356,
513
+ "step": 72
514
+ },
515
+ {
516
+ "epoch": 0.0718503937007874,
517
+ "grad_norm": 17.15074920654297,
518
+ "learning_rate": 6.75944333996024e-08,
519
+ "loss": 4.568,
520
+ "step": 73
521
+ },
522
+ {
523
+ "epoch": 0.07283464566929133,
524
+ "grad_norm": 14.456765174865723,
525
+ "learning_rate": 6.858846918489067e-08,
526
+ "loss": 4.0898,
527
+ "step": 74
528
+ },
529
+ {
530
+ "epoch": 0.07381889763779527,
531
+ "grad_norm": 9.999987602233887,
532
+ "learning_rate": 6.958250497017893e-08,
533
+ "loss": 3.4502,
534
+ "step": 75
535
+ },
536
+ {
537
+ "epoch": 0.07480314960629922,
538
+ "grad_norm": 13.652220726013184,
539
+ "learning_rate": 7.057654075546721e-08,
540
+ "loss": 3.7733,
541
+ "step": 76
542
+ },
543
+ {
544
+ "epoch": 0.07578740157480315,
545
+ "grad_norm": 17.76757049560547,
546
+ "learning_rate": 7.157057654075548e-08,
547
+ "loss": 4.5204,
548
+ "step": 77
549
+ },
550
+ {
551
+ "epoch": 0.07677165354330709,
552
+ "grad_norm": 11.42149829864502,
553
+ "learning_rate": 7.256461232604374e-08,
554
+ "loss": 4.2526,
555
+ "step": 78
556
+ },
557
+ {
558
+ "epoch": 0.07775590551181102,
559
+ "grad_norm": 16.21160125732422,
560
+ "learning_rate": 7.355864811133201e-08,
561
+ "loss": 4.4398,
562
+ "step": 79
563
+ },
564
+ {
565
+ "epoch": 0.07874015748031496,
566
+ "grad_norm": 12.522687911987305,
567
+ "learning_rate": 7.455268389662029e-08,
568
+ "loss": 4.0988,
569
+ "step": 80
570
+ },
571
+ {
572
+ "epoch": 0.0797244094488189,
573
+ "grad_norm": 12.63741683959961,
574
+ "learning_rate": 7.554671968190855e-08,
575
+ "loss": 3.9704,
576
+ "step": 81
577
+ },
578
+ {
579
+ "epoch": 0.08070866141732283,
580
+ "grad_norm": 11.259520530700684,
581
+ "learning_rate": 7.654075546719683e-08,
582
+ "loss": 4.3343,
583
+ "step": 82
584
+ },
585
+ {
586
+ "epoch": 0.08169291338582677,
587
+ "grad_norm": 14.228102684020996,
588
+ "learning_rate": 7.753479125248511e-08,
589
+ "loss": 4.2587,
590
+ "step": 83
591
+ },
592
+ {
593
+ "epoch": 0.08267716535433071,
594
+ "grad_norm": 45.13947677612305,
595
+ "learning_rate": 7.852882703777338e-08,
596
+ "loss": 15.0149,
597
+ "step": 84
598
+ },
599
+ {
600
+ "epoch": 0.08366141732283465,
601
+ "grad_norm": 45.17081069946289,
602
+ "learning_rate": 7.952286282306164e-08,
603
+ "loss": 14.6599,
604
+ "step": 85
605
+ },
606
+ {
607
+ "epoch": 0.08464566929133858,
608
+ "grad_norm": 15.967412948608398,
609
+ "learning_rate": 8.051689860834992e-08,
610
+ "loss": 4.0623,
611
+ "step": 86
612
+ },
613
+ {
614
+ "epoch": 0.08562992125984252,
615
+ "grad_norm": 10.085712432861328,
616
+ "learning_rate": 8.151093439363819e-08,
617
+ "loss": 3.7597,
618
+ "step": 87
619
+ },
620
+ {
621
+ "epoch": 0.08661417322834646,
622
+ "grad_norm": 13.406641960144043,
623
+ "learning_rate": 8.250497017892645e-08,
624
+ "loss": 4.3433,
625
+ "step": 88
626
+ },
627
+ {
628
+ "epoch": 0.08759842519685039,
629
+ "grad_norm": 9.052105903625488,
630
+ "learning_rate": 8.349900596421472e-08,
631
+ "loss": 4.0287,
632
+ "step": 89
633
+ },
634
+ {
635
+ "epoch": 0.08858267716535433,
636
+ "grad_norm": 12.489309310913086,
637
+ "learning_rate": 8.4493041749503e-08,
638
+ "loss": 4.6257,
639
+ "step": 90
640
+ },
641
+ {
642
+ "epoch": 0.08956692913385826,
643
+ "grad_norm": 40.478675842285156,
644
+ "learning_rate": 8.548707753479126e-08,
645
+ "loss": 13.4689,
646
+ "step": 91
647
+ },
648
+ {
649
+ "epoch": 0.09055118110236221,
650
+ "grad_norm": 14.329568862915039,
651
+ "learning_rate": 8.648111332007953e-08,
652
+ "loss": 4.6583,
653
+ "step": 92
654
+ },
655
+ {
656
+ "epoch": 0.09153543307086615,
657
+ "grad_norm": 10.07358455657959,
658
+ "learning_rate": 8.747514910536782e-08,
659
+ "loss": 4.2682,
660
+ "step": 93
661
+ },
662
+ {
663
+ "epoch": 0.09251968503937008,
664
+ "grad_norm": 12.861531257629395,
665
+ "learning_rate": 8.846918489065609e-08,
666
+ "loss": 4.468,
667
+ "step": 94
668
+ },
669
+ {
670
+ "epoch": 0.09350393700787402,
671
+ "grad_norm": 12.121801376342773,
672
+ "learning_rate": 8.946322067594435e-08,
673
+ "loss": 3.4333,
674
+ "step": 95
675
+ },
676
+ {
677
+ "epoch": 0.09448818897637795,
678
+ "grad_norm": 40.14929962158203,
679
+ "learning_rate": 9.045725646123262e-08,
680
+ "loss": 12.7654,
681
+ "step": 96
682
+ },
683
+ {
684
+ "epoch": 0.09547244094488189,
685
+ "grad_norm": 11.191559791564941,
686
+ "learning_rate": 9.14512922465209e-08,
687
+ "loss": 3.5577,
688
+ "step": 97
689
+ },
690
+ {
691
+ "epoch": 0.09645669291338582,
692
+ "grad_norm": 40.13950729370117,
693
+ "learning_rate": 9.244532803180916e-08,
694
+ "loss": 12.5875,
695
+ "step": 98
696
+ },
697
+ {
698
+ "epoch": 0.09744094488188976,
699
+ "grad_norm": 12.063894271850586,
700
+ "learning_rate": 9.343936381709743e-08,
701
+ "loss": 4.2206,
702
+ "step": 99
703
+ },
704
+ {
705
+ "epoch": 0.0984251968503937,
706
+ "grad_norm": 10.066577911376953,
707
+ "learning_rate": 9.44333996023857e-08,
708
+ "loss": 3.5981,
709
+ "step": 100
710
+ },
711
+ {
712
+ "epoch": 0.09940944881889764,
713
+ "grad_norm": 10.917841911315918,
714
+ "learning_rate": 9.542743538767397e-08,
715
+ "loss": 3.5575,
716
+ "step": 101
717
+ },
718
+ {
719
+ "epoch": 0.10039370078740158,
720
+ "grad_norm": 11.512818336486816,
721
+ "learning_rate": 9.642147117296224e-08,
722
+ "loss": 4.0271,
723
+ "step": 102
724
+ },
725
+ {
726
+ "epoch": 0.10137795275590551,
727
+ "grad_norm": 13.737354278564453,
728
+ "learning_rate": 9.74155069582505e-08,
729
+ "loss": 4.0803,
730
+ "step": 103
731
+ },
732
+ {
733
+ "epoch": 0.10236220472440945,
734
+ "grad_norm": 12.92113208770752,
735
+ "learning_rate": 9.840954274353878e-08,
736
+ "loss": 4.0886,
737
+ "step": 104
738
+ },
739
+ {
740
+ "epoch": 0.10334645669291338,
741
+ "grad_norm": 16.23849868774414,
742
+ "learning_rate": 9.940357852882706e-08,
743
+ "loss": 4.176,
744
+ "step": 105
745
+ },
746
+ {
747
+ "epoch": 0.10433070866141732,
748
+ "grad_norm": 13.244183540344238,
749
+ "learning_rate": 1.0039761431411533e-07,
750
+ "loss": 4.6653,
751
+ "step": 106
752
+ },
753
+ {
754
+ "epoch": 0.10531496062992125,
755
+ "grad_norm": 12.089069366455078,
756
+ "learning_rate": 1.013916500994036e-07,
757
+ "loss": 4.3076,
758
+ "step": 107
759
+ },
760
+ {
761
+ "epoch": 0.1062992125984252,
762
+ "grad_norm": 28.261154174804688,
763
+ "learning_rate": 1.0238568588469187e-07,
764
+ "loss": 8.7282,
765
+ "step": 108
766
+ },
767
+ {
768
+ "epoch": 0.10728346456692914,
769
+ "grad_norm": 10.686351776123047,
770
+ "learning_rate": 1.0337972166998014e-07,
771
+ "loss": 3.4192,
772
+ "step": 109
773
+ },
774
+ {
775
+ "epoch": 0.10826771653543307,
776
+ "grad_norm": 41.12674331665039,
777
+ "learning_rate": 1.043737574552684e-07,
778
+ "loss": 10.6027,
779
+ "step": 110
780
+ },
781
+ {
782
+ "epoch": 0.10925196850393701,
783
+ "grad_norm": 13.403799057006836,
784
+ "learning_rate": 1.0536779324055668e-07,
785
+ "loss": 4.0959,
786
+ "step": 111
787
+ },
788
+ {
789
+ "epoch": 0.11023622047244094,
790
+ "grad_norm": 11.321606636047363,
791
+ "learning_rate": 1.0636182902584495e-07,
792
+ "loss": 4.2785,
793
+ "step": 112
794
+ },
795
+ {
796
+ "epoch": 0.11122047244094488,
797
+ "grad_norm": 14.717891693115234,
798
+ "learning_rate": 1.0735586481113321e-07,
799
+ "loss": 3.9945,
800
+ "step": 113
801
+ },
802
+ {
803
+ "epoch": 0.11220472440944881,
804
+ "grad_norm": 45.00100326538086,
805
+ "learning_rate": 1.0834990059642149e-07,
806
+ "loss": 10.0652,
807
+ "step": 114
808
+ },
809
+ {
810
+ "epoch": 0.11318897637795275,
811
+ "grad_norm": 12.017743110656738,
812
+ "learning_rate": 1.0934393638170976e-07,
813
+ "loss": 3.8621,
814
+ "step": 115
815
+ },
816
+ {
817
+ "epoch": 0.1141732283464567,
818
+ "grad_norm": 14.086198806762695,
819
+ "learning_rate": 1.1033797216699802e-07,
820
+ "loss": 4.3975,
821
+ "step": 116
822
+ },
823
+ {
824
+ "epoch": 0.11515748031496063,
825
+ "grad_norm": 43.2061767578125,
826
+ "learning_rate": 1.1133200795228631e-07,
827
+ "loss": 9.7899,
828
+ "step": 117
829
+ },
830
+ {
831
+ "epoch": 0.11614173228346457,
832
+ "grad_norm": 12.647043228149414,
833
+ "learning_rate": 1.1232604373757458e-07,
834
+ "loss": 4.3812,
835
+ "step": 118
836
+ },
837
+ {
838
+ "epoch": 0.1171259842519685,
839
+ "grad_norm": 11.732013702392578,
840
+ "learning_rate": 1.1332007952286284e-07,
841
+ "loss": 3.8715,
842
+ "step": 119
843
+ },
844
+ {
845
+ "epoch": 0.11811023622047244,
846
+ "grad_norm": 9.888449668884277,
847
+ "learning_rate": 1.1431411530815111e-07,
848
+ "loss": 3.8327,
849
+ "step": 120
850
+ },
851
+ {
852
+ "epoch": 0.11909448818897637,
853
+ "grad_norm": 9.061322212219238,
854
+ "learning_rate": 1.1530815109343939e-07,
855
+ "loss": 3.5103,
856
+ "step": 121
857
+ },
858
+ {
859
+ "epoch": 0.12007874015748031,
860
+ "grad_norm": 38.643943786621094,
861
+ "learning_rate": 1.1630218687872765e-07,
862
+ "loss": 9.3158,
863
+ "step": 122
864
+ },
865
+ {
866
+ "epoch": 0.12106299212598425,
867
+ "grad_norm": 11.240921974182129,
868
+ "learning_rate": 1.1729622266401592e-07,
869
+ "loss": 3.7201,
870
+ "step": 123
871
+ },
872
+ {
873
+ "epoch": 0.1220472440944882,
874
+ "grad_norm": 11.231223106384277,
875
+ "learning_rate": 1.182902584493042e-07,
876
+ "loss": 3.4311,
877
+ "step": 124
878
+ },
879
+ {
880
+ "epoch": 0.12303149606299213,
881
+ "grad_norm": 11.026339530944824,
882
+ "learning_rate": 1.1928429423459245e-07,
883
+ "loss": 3.7946,
884
+ "step": 125
885
+ },
886
+ {
887
+ "epoch": 0.12401574803149606,
888
+ "grad_norm": 11.620814323425293,
889
+ "learning_rate": 1.2027833001988073e-07,
890
+ "loss": 4.0456,
891
+ "step": 126
892
+ },
893
+ {
894
+ "epoch": 0.125,
895
+ "grad_norm": 9.652909278869629,
896
+ "learning_rate": 1.21272365805169e-07,
897
+ "loss": 3.482,
898
+ "step": 127
899
+ },
900
+ {
901
+ "epoch": 0.12598425196850394,
902
+ "grad_norm": 9.82579231262207,
903
+ "learning_rate": 1.222664015904573e-07,
904
+ "loss": 3.1901,
905
+ "step": 128
906
+ },
907
+ {
908
+ "epoch": 0.12696850393700787,
909
+ "grad_norm": 10.219281196594238,
910
+ "learning_rate": 1.2326043737574557e-07,
911
+ "loss": 3.414,
912
+ "step": 129
913
+ },
914
+ {
915
+ "epoch": 0.1279527559055118,
916
+ "grad_norm": 9.734150886535645,
917
+ "learning_rate": 1.2425447316103382e-07,
918
+ "loss": 3.4967,
919
+ "step": 130
920
+ },
921
+ {
922
+ "epoch": 0.12893700787401574,
923
+ "grad_norm": 10.505714416503906,
924
+ "learning_rate": 1.2524850894632207e-07,
925
+ "loss": 3.6594,
926
+ "step": 131
927
+ },
928
+ {
929
+ "epoch": 0.12992125984251968,
930
+ "grad_norm": 39.98779296875,
931
+ "learning_rate": 1.2624254473161035e-07,
932
+ "loss": 8.066,
933
+ "step": 132
934
+ },
935
+ {
936
+ "epoch": 0.1309055118110236,
937
+ "grad_norm": 9.36937427520752,
938
+ "learning_rate": 1.2723658051689863e-07,
939
+ "loss": 3.7872,
940
+ "step": 133
941
+ },
942
+ {
943
+ "epoch": 0.13188976377952755,
944
+ "grad_norm": 10.953369140625,
945
+ "learning_rate": 1.282306163021869e-07,
946
+ "loss": 4.0023,
947
+ "step": 134
948
+ },
949
+ {
950
+ "epoch": 0.1328740157480315,
951
+ "grad_norm": 11.828601837158203,
952
+ "learning_rate": 1.2922465208747519e-07,
953
+ "loss": 3.7728,
954
+ "step": 135
955
+ },
956
+ {
957
+ "epoch": 0.13385826771653545,
958
+ "grad_norm": 9.795777320861816,
959
+ "learning_rate": 1.3021868787276344e-07,
960
+ "loss": 3.1893,
961
+ "step": 136
962
+ },
963
+ {
964
+ "epoch": 0.13484251968503938,
965
+ "grad_norm": 10.846212387084961,
966
+ "learning_rate": 1.3121272365805172e-07,
967
+ "loss": 3.3635,
968
+ "step": 137
969
+ },
970
+ {
971
+ "epoch": 0.13582677165354332,
972
+ "grad_norm": 13.001614570617676,
973
+ "learning_rate": 1.3220675944333997e-07,
974
+ "loss": 4.0195,
975
+ "step": 138
976
+ },
977
+ {
978
+ "epoch": 0.13681102362204725,
979
+ "grad_norm": 11.749286651611328,
980
+ "learning_rate": 1.3320079522862825e-07,
981
+ "loss": 4.1097,
982
+ "step": 139
983
+ },
984
+ {
985
+ "epoch": 0.1377952755905512,
986
+ "grad_norm": 12.08881664276123,
987
+ "learning_rate": 1.3419483101391653e-07,
988
+ "loss": 3.7903,
989
+ "step": 140
990
+ },
991
+ {
992
+ "epoch": 0.13877952755905512,
993
+ "grad_norm": 10.898722648620605,
994
+ "learning_rate": 1.351888667992048e-07,
995
+ "loss": 3.5748,
996
+ "step": 141
997
+ },
998
+ {
999
+ "epoch": 0.13976377952755906,
1000
+ "grad_norm": 11.456235885620117,
1001
+ "learning_rate": 1.3618290258449306e-07,
1002
+ "loss": 3.8104,
1003
+ "step": 142
1004
+ },
1005
+ {
1006
+ "epoch": 0.140748031496063,
1007
+ "grad_norm": 47.697425842285156,
1008
+ "learning_rate": 1.3717693836978134e-07,
1009
+ "loss": 8.0411,
1010
+ "step": 143
1011
+ },
1012
+ {
1013
+ "epoch": 0.14173228346456693,
1014
+ "grad_norm": 10.220965385437012,
1015
+ "learning_rate": 1.381709741550696e-07,
1016
+ "loss": 3.4819,
1017
+ "step": 144
1018
+ },
1019
+ {
1020
+ "epoch": 0.14271653543307086,
1021
+ "grad_norm": 13.412939071655273,
1022
+ "learning_rate": 1.3916500994035787e-07,
1023
+ "loss": 3.452,
1024
+ "step": 145
1025
+ },
1026
+ {
1027
+ "epoch": 0.1437007874015748,
1028
+ "grad_norm": 11.866227149963379,
1029
+ "learning_rate": 1.4015904572564615e-07,
1030
+ "loss": 3.5861,
1031
+ "step": 146
1032
+ },
1033
+ {
1034
+ "epoch": 0.14468503937007873,
1035
+ "grad_norm": 10.724785804748535,
1036
+ "learning_rate": 1.4115308151093443e-07,
1037
+ "loss": 3.4324,
1038
+ "step": 147
1039
+ },
1040
+ {
1041
+ "epoch": 0.14566929133858267,
1042
+ "grad_norm": 11.023091316223145,
1043
+ "learning_rate": 1.421471172962227e-07,
1044
+ "loss": 3.521,
1045
+ "step": 148
1046
+ },
1047
+ {
1048
+ "epoch": 0.1466535433070866,
1049
+ "grad_norm": 12.216788291931152,
1050
+ "learning_rate": 1.4314115308151096e-07,
1051
+ "loss": 3.8868,
1052
+ "step": 149
1053
+ },
1054
+ {
1055
+ "epoch": 0.14763779527559054,
1056
+ "grad_norm": 71.6353530883789,
1057
+ "learning_rate": 1.4413518886679924e-07,
1058
+ "loss": 8.1191,
1059
+ "step": 150
1060
+ },
1061
+ {
1062
+ "epoch": 0.1486220472440945,
1063
+ "grad_norm": 10.8847074508667,
1064
+ "learning_rate": 1.451292246520875e-07,
1065
+ "loss": 3.6447,
1066
+ "step": 151
1067
+ },
1068
+ {
1069
+ "epoch": 0.14960629921259844,
1070
+ "grad_norm": 11.547443389892578,
1071
+ "learning_rate": 1.4612326043737577e-07,
1072
+ "loss": 2.9436,
1073
+ "step": 152
1074
+ },
1075
+ {
1076
+ "epoch": 0.15059055118110237,
1077
+ "grad_norm": 42.24600601196289,
1078
+ "learning_rate": 1.4711729622266402e-07,
1079
+ "loss": 8.1535,
1080
+ "step": 153
1081
+ },
1082
+ {
1083
+ "epoch": 0.15059055118110237,
1084
+ "eval_Qnli-dev_cosine_accuracy": 0.591796875,
1085
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.9479926824569702,
1086
+ "eval_Qnli-dev_cosine_ap": 0.5658036772817674,
1087
+ "eval_Qnli-dev_cosine_f1": 0.6291834002677376,
1088
+ "eval_Qnli-dev_cosine_f1_threshold": 0.7761930823326111,
1089
+ "eval_Qnli-dev_cosine_precision": 0.4598825831702544,
1090
+ "eval_Qnli-dev_cosine_recall": 0.9957627118644068,
1091
+ "eval_Qnli-dev_dot_accuracy": 0.59375,
1092
+ "eval_Qnli-dev_dot_accuracy_threshold": 724.091064453125,
1093
+ "eval_Qnli-dev_dot_ap": 0.5657459555147606,
1094
+ "eval_Qnli-dev_dot_f1": 0.6291834002677376,
1095
+ "eval_Qnli-dev_dot_f1_threshold": 596.2498779296875,
1096
+ "eval_Qnli-dev_dot_precision": 0.4598825831702544,
1097
+ "eval_Qnli-dev_dot_recall": 0.9957627118644068,
1098
+ "eval_Qnli-dev_euclidean_accuracy": 0.591796875,
1099
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 8.938886642456055,
1100
+ "eval_Qnli-dev_euclidean_ap": 0.5658036772817674,
1101
+ "eval_Qnli-dev_euclidean_f1": 0.6291834002677376,
1102
+ "eval_Qnli-dev_euclidean_f1_threshold": 18.542938232421875,
1103
+ "eval_Qnli-dev_euclidean_precision": 0.4598825831702544,
1104
+ "eval_Qnli-dev_euclidean_recall": 0.9957627118644068,
1105
+ "eval_Qnli-dev_manhattan_accuracy": 0.6171875,
1106
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 202.07958984375,
1107
+ "eval_Qnli-dev_manhattan_ap": 0.5891966424964378,
1108
+ "eval_Qnli-dev_manhattan_f1": 0.6291834002677376,
1109
+ "eval_Qnli-dev_manhattan_f1_threshold": 307.9236145019531,
1110
+ "eval_Qnli-dev_manhattan_precision": 0.4598825831702544,
1111
+ "eval_Qnli-dev_manhattan_recall": 0.9957627118644068,
1112
+ "eval_Qnli-dev_max_accuracy": 0.6171875,
1113
+ "eval_Qnli-dev_max_accuracy_threshold": 724.091064453125,
1114
+ "eval_Qnli-dev_max_ap": 0.5891966424964378,
1115
+ "eval_Qnli-dev_max_f1": 0.6291834002677376,
1116
+ "eval_Qnli-dev_max_f1_threshold": 596.2498779296875,
1117
+ "eval_Qnli-dev_max_precision": 0.4598825831702544,
1118
+ "eval_Qnli-dev_max_recall": 0.9957627118644068,
1119
+ "eval_allNLI-dev_cosine_accuracy": 0.666015625,
1120
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.9797871112823486,
1121
+ "eval_allNLI-dev_cosine_ap": 0.4008449937025217,
1122
+ "eval_allNLI-dev_cosine_f1": 0.504258943781942,
1123
+ "eval_allNLI-dev_cosine_f1_threshold": 0.8929213285446167,
1124
+ "eval_allNLI-dev_cosine_precision": 0.357487922705314,
1125
+ "eval_allNLI-dev_cosine_recall": 0.8554913294797688,
1126
+ "eval_allNLI-dev_dot_accuracy": 0.666015625,
1127
+ "eval_allNLI-dev_dot_accuracy_threshold": 752.6634521484375,
1128
+ "eval_allNLI-dev_dot_ap": 0.40071344979441287,
1129
+ "eval_allNLI-dev_dot_f1": 0.504258943781942,
1130
+ "eval_allNLI-dev_dot_f1_threshold": 685.9220581054688,
1131
+ "eval_allNLI-dev_dot_precision": 0.357487922705314,
1132
+ "eval_allNLI-dev_dot_recall": 0.8554913294797688,
1133
+ "eval_allNLI-dev_euclidean_accuracy": 0.666015625,
1134
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 5.572628974914551,
1135
+ "eval_allNLI-dev_euclidean_ap": 0.40083962142052487,
1136
+ "eval_allNLI-dev_euclidean_f1": 0.504258943781942,
1137
+ "eval_allNLI-dev_euclidean_f1_threshold": 12.826179504394531,
1138
+ "eval_allNLI-dev_euclidean_precision": 0.357487922705314,
1139
+ "eval_allNLI-dev_euclidean_recall": 0.8554913294797688,
1140
+ "eval_allNLI-dev_manhattan_accuracy": 0.66796875,
1141
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 144.52613830566406,
1142
+ "eval_allNLI-dev_manhattan_ap": 0.4008700157620745,
1143
+ "eval_allNLI-dev_manhattan_f1": 0.5075987841945289,
1144
+ "eval_allNLI-dev_manhattan_f1_threshold": 267.046875,
1145
+ "eval_allNLI-dev_manhattan_precision": 0.3443298969072165,
1146
+ "eval_allNLI-dev_manhattan_recall": 0.9653179190751445,
1147
+ "eval_allNLI-dev_max_accuracy": 0.66796875,
1148
+ "eval_allNLI-dev_max_accuracy_threshold": 752.6634521484375,
1149
+ "eval_allNLI-dev_max_ap": 0.4008700157620745,
1150
+ "eval_allNLI-dev_max_f1": 0.5075987841945289,
1151
+ "eval_allNLI-dev_max_f1_threshold": 685.9220581054688,
1152
+ "eval_allNLI-dev_max_precision": 0.357487922705314,
1153
+ "eval_allNLI-dev_max_recall": 0.9653179190751445,
1154
+ "eval_loss": 2.203232526779175,
1155
+ "eval_runtime": 50.3824,
1156
+ "eval_samples_per_second": 33.027,
1157
+ "eval_sequential_score": 0.5891966424964378,
1158
+ "eval_steps_per_second": 0.139,
1159
+ "eval_sts-test_pearson_cosine": 0.1561600438268545,
1160
+ "eval_sts-test_pearson_dot": 0.15588248423807516,
1161
+ "eval_sts-test_pearson_euclidean": 0.1908690981304929,
1162
+ "eval_sts-test_pearson_manhattan": 0.2216924674035587,
1163
+ "eval_sts-test_pearson_max": 0.2216924674035587,
1164
+ "eval_sts-test_spearman_cosine": 0.22356441354815124,
1165
+ "eval_sts-test_spearman_dot": 0.22337189362164545,
1166
+ "eval_sts-test_spearman_euclidean": 0.22363767136304896,
1167
+ "eval_sts-test_spearman_manhattan": 0.24997065610359018,
1168
+ "eval_sts-test_spearman_max": 0.24997065610359018,
1169
+ "step": 153
1170
+ },
1171
+ {
1172
+ "epoch": 0.1515748031496063,
1173
+ "grad_norm": 13.303340911865234,
1174
+ "learning_rate": 1.4811133200795232e-07,
1175
+ "loss": 3.9619,
1176
+ "step": 154
1177
+ },
1178
+ {
1179
+ "epoch": 0.15255905511811024,
1180
+ "grad_norm": 10.895444869995117,
1181
+ "learning_rate": 1.4910536779324058e-07,
1182
+ "loss": 3.1301,
1183
+ "step": 155
1184
+ },
1185
+ {
1186
+ "epoch": 0.15354330708661418,
1187
+ "grad_norm": 9.847289085388184,
1188
+ "learning_rate": 1.5009940357852886e-07,
1189
+ "loss": 3.0478,
1190
+ "step": 156
1191
+ },
1192
+ {
1193
+ "epoch": 0.1545275590551181,
1194
+ "grad_norm": 9.858047485351562,
1195
+ "learning_rate": 1.510934393638171e-07,
1196
+ "loss": 3.2986,
1197
+ "step": 157
1198
+ },
1199
+ {
1200
+ "epoch": 0.15551181102362205,
1201
+ "grad_norm": 11.554423332214355,
1202
+ "learning_rate": 1.5208747514910539e-07,
1203
+ "loss": 3.2847,
1204
+ "step": 158
1205
+ },
1206
+ {
1207
+ "epoch": 0.15649606299212598,
1208
+ "grad_norm": 11.145759582519531,
1209
+ "learning_rate": 1.5308151093439367e-07,
1210
+ "loss": 3.6599,
1211
+ "step": 159
1212
+ },
1213
+ {
1214
+ "epoch": 0.15748031496062992,
1215
+ "grad_norm": 9.598219871520996,
1216
+ "learning_rate": 1.5407554671968192e-07,
1217
+ "loss": 3.2238,
1218
+ "step": 160
1219
+ },
1220
+ {
1221
+ "epoch": 0.15846456692913385,
1222
+ "grad_norm": 11.533960342407227,
1223
+ "learning_rate": 1.5506958250497022e-07,
1224
+ "loss": 2.8897,
1225
+ "step": 161
1226
+ },
1227
+ {
1228
+ "epoch": 0.1594488188976378,
1229
+ "grad_norm": 12.17159652709961,
1230
+ "learning_rate": 1.5606361829025848e-07,
1231
+ "loss": 3.9443,
1232
+ "step": 162
1233
+ },
1234
+ {
1235
+ "epoch": 0.16043307086614172,
1236
+ "grad_norm": 10.684307098388672,
1237
+ "learning_rate": 1.5705765407554675e-07,
1238
+ "loss": 3.3733,
1239
+ "step": 163
1240
+ },
1241
+ {
1242
+ "epoch": 0.16141732283464566,
1243
+ "grad_norm": 12.093358039855957,
1244
+ "learning_rate": 1.58051689860835e-07,
1245
+ "loss": 3.7444,
1246
+ "step": 164
1247
+ },
1248
+ {
1249
+ "epoch": 0.1624015748031496,
1250
+ "grad_norm": 12.204547882080078,
1251
+ "learning_rate": 1.5904572564612329e-07,
1252
+ "loss": 3.4813,
1253
+ "step": 165
1254
+ },
1255
+ {
1256
+ "epoch": 0.16338582677165353,
1257
+ "grad_norm": 11.32477855682373,
1258
+ "learning_rate": 1.6003976143141154e-07,
1259
+ "loss": 2.6865,
1260
+ "step": 166
1261
+ },
1262
+ {
1263
+ "epoch": 0.1643700787401575,
1264
+ "grad_norm": 10.9214506149292,
1265
+ "learning_rate": 1.6103379721669984e-07,
1266
+ "loss": 2.7587,
1267
+ "step": 167
1268
+ },
1269
+ {
1270
+ "epoch": 0.16535433070866143,
1271
+ "grad_norm": 10.949960708618164,
1272
+ "learning_rate": 1.620278330019881e-07,
1273
+ "loss": 3.3628,
1274
+ "step": 168
1275
+ },
1276
+ {
1277
+ "epoch": 0.16633858267716536,
1278
+ "grad_norm": 12.229423522949219,
1279
+ "learning_rate": 1.6302186878727637e-07,
1280
+ "loss": 3.0035,
1281
+ "step": 169
1282
+ },
1283
+ {
1284
+ "epoch": 0.1673228346456693,
1285
+ "grad_norm": 77.52136993408203,
1286
+ "learning_rate": 1.6401590457256465e-07,
1287
+ "loss": 10.1591,
1288
+ "step": 170
1289
+ },
1290
+ {
1291
+ "epoch": 0.16830708661417323,
1292
+ "grad_norm": 13.680435180664062,
1293
+ "learning_rate": 1.650099403578529e-07,
1294
+ "loss": 3.5366,
1295
+ "step": 171
1296
+ },
1297
+ {
1298
+ "epoch": 0.16929133858267717,
1299
+ "grad_norm": 51.19025421142578,
1300
+ "learning_rate": 1.6600397614314118e-07,
1301
+ "loss": 8.4047,
1302
+ "step": 172
1303
+ },
1304
+ {
1305
+ "epoch": 0.1702755905511811,
1306
+ "grad_norm": 12.889047622680664,
1307
+ "learning_rate": 1.6699801192842944e-07,
1308
+ "loss": 3.8643,
1309
+ "step": 173
1310
+ },
1311
+ {
1312
+ "epoch": 0.17125984251968504,
1313
+ "grad_norm": 12.575840950012207,
1314
+ "learning_rate": 1.6799204771371774e-07,
1315
+ "loss": 3.3529,
1316
+ "step": 174
1317
+ },
1318
+ {
1319
+ "epoch": 0.17224409448818898,
1320
+ "grad_norm": 11.404970169067383,
1321
+ "learning_rate": 1.68986083499006e-07,
1322
+ "loss": 3.7143,
1323
+ "step": 175
1324
+ },
1325
+ {
1326
+ "epoch": 0.1732283464566929,
1327
+ "grad_norm": 11.78778076171875,
1328
+ "learning_rate": 1.6998011928429427e-07,
1329
+ "loss": 3.3323,
1330
+ "step": 176
1331
+ },
1332
+ {
1333
+ "epoch": 0.17421259842519685,
1334
+ "grad_norm": 11.07532787322998,
1335
+ "learning_rate": 1.7097415506958253e-07,
1336
+ "loss": 3.1206,
1337
+ "step": 177
1338
+ },
1339
+ {
1340
+ "epoch": 0.17519685039370078,
1341
+ "grad_norm": 13.040700912475586,
1342
+ "learning_rate": 1.719681908548708e-07,
1343
+ "loss": 3.1348,
1344
+ "step": 178
1345
+ },
1346
+ {
1347
+ "epoch": 0.17618110236220472,
1348
+ "grad_norm": 49.289466857910156,
1349
+ "learning_rate": 1.7296222664015906e-07,
1350
+ "loss": 7.6011,
1351
+ "step": 179
1352
+ },
1353
+ {
1354
+ "epoch": 0.17716535433070865,
1355
+ "grad_norm": 14.549922943115234,
1356
+ "learning_rate": 1.7395626242544734e-07,
1357
+ "loss": 3.7025,
1358
+ "step": 180
1359
+ },
1360
+ {
1361
+ "epoch": 0.1781496062992126,
1362
+ "grad_norm": 104.96332550048828,
1363
+ "learning_rate": 1.7495029821073564e-07,
1364
+ "loss": 10.5662,
1365
+ "step": 181
1366
+ },
1367
+ {
1368
+ "epoch": 0.17913385826771652,
1369
+ "grad_norm": 61.00530242919922,
1370
+ "learning_rate": 1.759443339960239e-07,
1371
+ "loss": 8.966,
1372
+ "step": 182
1373
+ },
1374
+ {
1375
+ "epoch": 0.18011811023622049,
1376
+ "grad_norm": 68.94571685791016,
1377
+ "learning_rate": 1.7693836978131217e-07,
1378
+ "loss": 9.426,
1379
+ "step": 183
1380
+ },
1381
+ {
1382
+ "epoch": 0.18110236220472442,
1383
+ "grad_norm": 11.119109153747559,
1384
+ "learning_rate": 1.7793240556660042e-07,
1385
+ "loss": 3.0025,
1386
+ "step": 184
1387
+ },
1388
+ {
1389
+ "epoch": 0.18208661417322836,
1390
+ "grad_norm": 34.247840881347656,
1391
+ "learning_rate": 1.789264413518887e-07,
1392
+ "loss": 7.0984,
1393
+ "step": 185
1394
+ },
1395
+ {
1396
+ "epoch": 0.1830708661417323,
1397
+ "grad_norm": 47.956634521484375,
1398
+ "learning_rate": 1.7992047713717695e-07,
1399
+ "loss": 7.3808,
1400
+ "step": 186
1401
+ },
1402
+ {
1403
+ "epoch": 0.18405511811023623,
1404
+ "grad_norm": 11.798696517944336,
1405
+ "learning_rate": 1.8091451292246523e-07,
1406
+ "loss": 2.8657,
1407
+ "step": 187
1408
+ },
1409
+ {
1410
+ "epoch": 0.18503937007874016,
1411
+ "grad_norm": 38.751102447509766,
1412
+ "learning_rate": 1.819085487077535e-07,
1413
+ "loss": 6.5636,
1414
+ "step": 188
1415
+ },
1416
+ {
1417
+ "epoch": 0.1860236220472441,
1418
+ "grad_norm": 14.265003204345703,
1419
+ "learning_rate": 1.829025844930418e-07,
1420
+ "loss": 3.4702,
1421
+ "step": 189
1422
+ },
1423
+ {
1424
+ "epoch": 0.18700787401574803,
1425
+ "grad_norm": 35.365360260009766,
1426
+ "learning_rate": 1.8389662027833004e-07,
1427
+ "loss": 5.9302,
1428
+ "step": 190
1429
+ },
1430
+ {
1431
+ "epoch": 0.18799212598425197,
1432
+ "grad_norm": 10.978341102600098,
1433
+ "learning_rate": 1.8489065606361832e-07,
1434
+ "loss": 3.2406,
1435
+ "step": 191
1436
+ },
1437
+ {
1438
+ "epoch": 0.1889763779527559,
1439
+ "grad_norm": 12.03227710723877,
1440
+ "learning_rate": 1.8588469184890657e-07,
1441
+ "loss": 3.4459,
1442
+ "step": 192
1443
+ },
1444
+ {
1445
+ "epoch": 0.18996062992125984,
1446
+ "grad_norm": 21.640823364257812,
1447
+ "learning_rate": 1.8687872763419485e-07,
1448
+ "loss": 5.269,
1449
+ "step": 193
1450
+ },
1451
+ {
1452
+ "epoch": 0.19094488188976377,
1453
+ "grad_norm": 21.88094139099121,
1454
+ "learning_rate": 1.8787276341948313e-07,
1455
+ "loss": 4.8605,
1456
+ "step": 194
1457
+ },
1458
+ {
1459
+ "epoch": 0.1919291338582677,
1460
+ "grad_norm": 10.063691139221191,
1461
+ "learning_rate": 1.888667992047714e-07,
1462
+ "loss": 2.9891,
1463
+ "step": 195
1464
+ },
1465
+ {
1466
+ "epoch": 0.19291338582677164,
1467
+ "grad_norm": 10.375823974609375,
1468
+ "learning_rate": 1.898608349900597e-07,
1469
+ "loss": 3.6681,
1470
+ "step": 196
1471
+ },
1472
+ {
1473
+ "epoch": 0.19389763779527558,
1474
+ "grad_norm": 10.918342590332031,
1475
+ "learning_rate": 1.9085487077534794e-07,
1476
+ "loss": 3.1589,
1477
+ "step": 197
1478
+ },
1479
+ {
1480
+ "epoch": 0.19488188976377951,
1481
+ "grad_norm": 12.367669105529785,
1482
+ "learning_rate": 1.9184890656063622e-07,
1483
+ "loss": 3.1835,
1484
+ "step": 198
1485
+ },
1486
+ {
1487
+ "epoch": 0.19586614173228348,
1488
+ "grad_norm": 12.073822021484375,
1489
+ "learning_rate": 1.9284294234592447e-07,
1490
+ "loss": 3.7561,
1491
+ "step": 199
1492
+ },
1493
+ {
1494
+ "epoch": 0.1968503937007874,
1495
+ "grad_norm": 12.768759727478027,
1496
+ "learning_rate": 1.9383697813121275e-07,
1497
+ "loss": 4.0891,
1498
+ "step": 200
1499
+ },
1500
+ {
1501
+ "epoch": 0.19783464566929135,
1502
+ "grad_norm": 12.075078964233398,
1503
+ "learning_rate": 1.94831013916501e-07,
1504
+ "loss": 3.563,
1505
+ "step": 201
1506
+ },
1507
+ {
1508
+ "epoch": 0.19881889763779528,
1509
+ "grad_norm": 13.486553192138672,
1510
+ "learning_rate": 1.958250497017893e-07,
1511
+ "loss": 3.7433,
1512
+ "step": 202
1513
+ },
1514
+ {
1515
+ "epoch": 0.19980314960629922,
1516
+ "grad_norm": 12.1553955078125,
1517
+ "learning_rate": 1.9681908548707756e-07,
1518
+ "loss": 3.3813,
1519
+ "step": 203
1520
+ },
1521
+ {
1522
+ "epoch": 0.20078740157480315,
1523
+ "grad_norm": 23.534427642822266,
1524
+ "learning_rate": 1.9781312127236584e-07,
1525
+ "loss": 5.2311,
1526
+ "step": 204
1527
+ },
1528
+ {
1529
+ "epoch": 0.2017716535433071,
1530
+ "grad_norm": 11.403944969177246,
1531
+ "learning_rate": 1.9880715705765412e-07,
1532
+ "loss": 3.3494,
1533
+ "step": 205
1534
+ },
1535
+ {
1536
+ "epoch": 0.20275590551181102,
1537
+ "grad_norm": 10.039087295532227,
1538
+ "learning_rate": 1.9980119284294237e-07,
1539
+ "loss": 3.3533,
1540
+ "step": 206
1541
+ },
1542
+ {
1543
+ "epoch": 0.20374015748031496,
1544
+ "grad_norm": 11.510744094848633,
1545
+ "learning_rate": 2.0079522862823065e-07,
1546
+ "loss": 3.688,
1547
+ "step": 207
1548
+ },
1549
+ {
1550
+ "epoch": 0.2047244094488189,
1551
+ "grad_norm": 10.009934425354004,
1552
+ "learning_rate": 2.017892644135189e-07,
1553
+ "loss": 3.5342,
1554
+ "step": 208
1555
+ },
1556
+ {
1557
+ "epoch": 0.20570866141732283,
1558
+ "grad_norm": 21.598899841308594,
1559
+ "learning_rate": 2.027833001988072e-07,
1560
+ "loss": 4.9381,
1561
+ "step": 209
1562
+ },
1563
+ {
1564
+ "epoch": 0.20669291338582677,
1565
+ "grad_norm": 10.349525451660156,
1566
+ "learning_rate": 2.0377733598409546e-07,
1567
+ "loss": 3.1839,
1568
+ "step": 210
1569
+ },
1570
+ {
1571
+ "epoch": 0.2076771653543307,
1572
+ "grad_norm": 10.374922752380371,
1573
+ "learning_rate": 2.0477137176938374e-07,
1574
+ "loss": 3.0465,
1575
+ "step": 211
1576
+ },
1577
+ {
1578
+ "epoch": 0.20866141732283464,
1579
+ "grad_norm": 12.201885223388672,
1580
+ "learning_rate": 2.05765407554672e-07,
1581
+ "loss": 3.1232,
1582
+ "step": 212
1583
+ },
1584
+ {
1585
+ "epoch": 0.20964566929133857,
1586
+ "grad_norm": 21.732847213745117,
1587
+ "learning_rate": 2.0675944333996027e-07,
1588
+ "loss": 4.6297,
1589
+ "step": 213
1590
+ },
1591
+ {
1592
+ "epoch": 0.2106299212598425,
1593
+ "grad_norm": 9.907038688659668,
1594
+ "learning_rate": 2.0775347912524852e-07,
1595
+ "loss": 2.9834,
1596
+ "step": 214
1597
+ },
1598
+ {
1599
+ "epoch": 0.21161417322834647,
1600
+ "grad_norm": 17.139137268066406,
1601
+ "learning_rate": 2.087475149105368e-07,
1602
+ "loss": 4.2231,
1603
+ "step": 215
1604
+ },
1605
+ {
1606
+ "epoch": 0.2125984251968504,
1607
+ "grad_norm": 12.830426216125488,
1608
+ "learning_rate": 2.097415506958251e-07,
1609
+ "loss": 3.1458,
1610
+ "step": 216
1611
+ },
1612
+ {
1613
+ "epoch": 0.21358267716535434,
1614
+ "grad_norm": 10.250080108642578,
1615
+ "learning_rate": 2.1073558648111336e-07,
1616
+ "loss": 3.2525,
1617
+ "step": 217
1618
+ },
1619
+ {
1620
+ "epoch": 0.21456692913385828,
1621
+ "grad_norm": 11.067953109741211,
1622
+ "learning_rate": 2.1172962226640164e-07,
1623
+ "loss": 3.5971,
1624
+ "step": 218
1625
+ },
1626
+ {
1627
+ "epoch": 0.2155511811023622,
1628
+ "grad_norm": 10.0042724609375,
1629
+ "learning_rate": 2.127236580516899e-07,
1630
+ "loss": 3.5616,
1631
+ "step": 219
1632
+ },
1633
+ {
1634
+ "epoch": 0.21653543307086615,
1635
+ "grad_norm": 10.706488609313965,
1636
+ "learning_rate": 2.1371769383697817e-07,
1637
+ "loss": 3.2378,
1638
+ "step": 220
1639
+ },
1640
+ {
1641
+ "epoch": 0.21751968503937008,
1642
+ "grad_norm": 12.0921630859375,
1643
+ "learning_rate": 2.1471172962226642e-07,
1644
+ "loss": 2.9075,
1645
+ "step": 221
1646
+ },
1647
+ {
1648
+ "epoch": 0.21850393700787402,
1649
+ "grad_norm": 11.22988510131836,
1650
+ "learning_rate": 2.1570576540755473e-07,
1651
+ "loss": 3.0391,
1652
+ "step": 222
1653
+ },
1654
+ {
1655
+ "epoch": 0.21948818897637795,
1656
+ "grad_norm": 11.993961334228516,
1657
+ "learning_rate": 2.1669980119284298e-07,
1658
+ "loss": 3.5573,
1659
+ "step": 223
1660
+ },
1661
+ {
1662
+ "epoch": 0.2204724409448819,
1663
+ "grad_norm": 11.548995971679688,
1664
+ "learning_rate": 2.1769383697813126e-07,
1665
+ "loss": 3.2092,
1666
+ "step": 224
1667
+ },
1668
+ {
1669
+ "epoch": 0.22145669291338582,
1670
+ "grad_norm": 9.982552528381348,
1671
+ "learning_rate": 2.186878727634195e-07,
1672
+ "loss": 3.2646,
1673
+ "step": 225
1674
+ },
1675
+ {
1676
+ "epoch": 0.22244094488188976,
1677
+ "grad_norm": 9.191473007202148,
1678
+ "learning_rate": 2.196819085487078e-07,
1679
+ "loss": 3.0886,
1680
+ "step": 226
1681
+ },
1682
+ {
1683
+ "epoch": 0.2234251968503937,
1684
+ "grad_norm": 14.892210006713867,
1685
+ "learning_rate": 2.2067594433399604e-07,
1686
+ "loss": 3.5241,
1687
+ "step": 227
1688
+ },
1689
+ {
1690
+ "epoch": 0.22440944881889763,
1691
+ "grad_norm": 12.9964599609375,
1692
+ "learning_rate": 2.2166998011928432e-07,
1693
+ "loss": 3.0111,
1694
+ "step": 228
1695
+ },
1696
+ {
1697
+ "epoch": 0.22539370078740156,
1698
+ "grad_norm": 14.988781929016113,
1699
+ "learning_rate": 2.2266401590457263e-07,
1700
+ "loss": 3.707,
1701
+ "step": 229
1702
+ },
1703
+ {
1704
+ "epoch": 0.2263779527559055,
1705
+ "grad_norm": 35.95463180541992,
1706
+ "learning_rate": 2.2365805168986088e-07,
1707
+ "loss": 5.3822,
1708
+ "step": 230
1709
+ },
1710
+ {
1711
+ "epoch": 0.22736220472440946,
1712
+ "grad_norm": 14.787928581237793,
1713
+ "learning_rate": 2.2465208747514916e-07,
1714
+ "loss": 3.2646,
1715
+ "step": 231
1716
+ },
1717
+ {
1718
+ "epoch": 0.2283464566929134,
1719
+ "grad_norm": 10.399752616882324,
1720
+ "learning_rate": 2.256461232604374e-07,
1721
+ "loss": 2.7021,
1722
+ "step": 232
1723
+ },
1724
+ {
1725
+ "epoch": 0.22933070866141733,
1726
+ "grad_norm": 12.82553482055664,
1727
+ "learning_rate": 2.266401590457257e-07,
1728
+ "loss": 3.5131,
1729
+ "step": 233
1730
+ },
1731
+ {
1732
+ "epoch": 0.23031496062992127,
1733
+ "grad_norm": 13.701425552368164,
1734
+ "learning_rate": 2.2763419483101394e-07,
1735
+ "loss": 3.103,
1736
+ "step": 234
1737
+ },
1738
+ {
1739
+ "epoch": 0.2312992125984252,
1740
+ "grad_norm": 11.487199783325195,
1741
+ "learning_rate": 2.2862823061630222e-07,
1742
+ "loss": 2.9535,
1743
+ "step": 235
1744
+ },
1745
+ {
1746
+ "epoch": 0.23228346456692914,
1747
+ "grad_norm": 11.957090377807617,
1748
+ "learning_rate": 2.296222664015905e-07,
1749
+ "loss": 2.9631,
1750
+ "step": 236
1751
+ },
1752
+ {
1753
+ "epoch": 0.23326771653543307,
1754
+ "grad_norm": 11.277992248535156,
1755
+ "learning_rate": 2.3061630218687878e-07,
1756
+ "loss": 2.8068,
1757
+ "step": 237
1758
+ },
1759
+ {
1760
+ "epoch": 0.234251968503937,
1761
+ "grad_norm": 14.584067344665527,
1762
+ "learning_rate": 2.3161033797216703e-07,
1763
+ "loss": 3.4251,
1764
+ "step": 238
1765
+ },
1766
+ {
1767
+ "epoch": 0.23523622047244094,
1768
+ "grad_norm": 12.476365089416504,
1769
+ "learning_rate": 2.326043737574553e-07,
1770
+ "loss": 2.8495,
1771
+ "step": 239
1772
+ },
1773
+ {
1774
+ "epoch": 0.23622047244094488,
1775
+ "grad_norm": 10.975320816040039,
1776
+ "learning_rate": 2.335984095427436e-07,
1777
+ "loss": 2.9972,
1778
+ "step": 240
1779
+ },
1780
+ {
1781
+ "epoch": 0.2372047244094488,
1782
+ "grad_norm": 15.740063667297363,
1783
+ "learning_rate": 2.3459244532803184e-07,
1784
+ "loss": 3.3509,
1785
+ "step": 241
1786
+ },
1787
+ {
1788
+ "epoch": 0.23818897637795275,
1789
+ "grad_norm": 11.939135551452637,
1790
+ "learning_rate": 2.3558648111332012e-07,
1791
+ "loss": 2.9234,
1792
+ "step": 242
1793
+ },
1794
+ {
1795
+ "epoch": 0.23917322834645668,
1796
+ "grad_norm": 11.509920120239258,
1797
+ "learning_rate": 2.365805168986084e-07,
1798
+ "loss": 2.4086,
1799
+ "step": 243
1800
+ },
1801
+ {
1802
+ "epoch": 0.24015748031496062,
1803
+ "grad_norm": 12.131606101989746,
1804
+ "learning_rate": 2.3757455268389668e-07,
1805
+ "loss": 3.1282,
1806
+ "step": 244
1807
+ },
1808
+ {
1809
+ "epoch": 0.24114173228346455,
1810
+ "grad_norm": 10.977266311645508,
1811
+ "learning_rate": 2.385685884691849e-07,
1812
+ "loss": 2.3352,
1813
+ "step": 245
1814
+ },
1815
+ {
1816
+ "epoch": 0.2421259842519685,
1817
+ "grad_norm": 13.836033821105957,
1818
+ "learning_rate": 2.395626242544732e-07,
1819
+ "loss": 2.4706,
1820
+ "step": 246
1821
+ },
1822
+ {
1823
+ "epoch": 0.24311023622047245,
1824
+ "grad_norm": 14.244443893432617,
1825
+ "learning_rate": 2.4055666003976146e-07,
1826
+ "loss": 3.5449,
1827
+ "step": 247
1828
+ },
1829
+ {
1830
+ "epoch": 0.2440944881889764,
1831
+ "grad_norm": 13.271723747253418,
1832
+ "learning_rate": 2.4155069582504976e-07,
1833
+ "loss": 2.8963,
1834
+ "step": 248
1835
+ },
1836
+ {
1837
+ "epoch": 0.24507874015748032,
1838
+ "grad_norm": 11.74501895904541,
1839
+ "learning_rate": 2.42544731610338e-07,
1840
+ "loss": 2.773,
1841
+ "step": 249
1842
+ },
1843
+ {
1844
+ "epoch": 0.24606299212598426,
1845
+ "grad_norm": 11.362481117248535,
1846
+ "learning_rate": 2.4353876739562627e-07,
1847
+ "loss": 2.355,
1848
+ "step": 250
1849
+ },
1850
+ {
1851
+ "epoch": 0.2470472440944882,
1852
+ "grad_norm": 14.688889503479004,
1853
+ "learning_rate": 2.445328031809146e-07,
1854
+ "loss": 2.656,
1855
+ "step": 251
1856
+ },
1857
+ {
1858
+ "epoch": 0.24803149606299213,
1859
+ "grad_norm": 10.654073715209961,
1860
+ "learning_rate": 2.4552683896620283e-07,
1861
+ "loss": 2.6221,
1862
+ "step": 252
1863
+ },
1864
+ {
1865
+ "epoch": 0.24901574803149606,
1866
+ "grad_norm": 54.549888610839844,
1867
+ "learning_rate": 2.4652087475149113e-07,
1868
+ "loss": 8.6739,
1869
+ "step": 253
1870
+ },
1871
+ {
1872
+ "epoch": 0.25,
1873
+ "grad_norm": 96.43404388427734,
1874
+ "learning_rate": 2.475149105367794e-07,
1875
+ "loss": 10.8242,
1876
+ "step": 254
1877
+ },
1878
+ {
1879
+ "epoch": 0.25098425196850394,
1880
+ "grad_norm": 12.798384666442871,
1881
+ "learning_rate": 2.4850894632206764e-07,
1882
+ "loss": 2.3408,
1883
+ "step": 255
1884
+ },
1885
+ {
1886
+ "epoch": 0.25196850393700787,
1887
+ "grad_norm": 11.494848251342773,
1888
+ "learning_rate": 2.495029821073559e-07,
1889
+ "loss": 2.1221,
1890
+ "step": 256
1891
+ },
1892
+ {
1893
+ "epoch": 0.2529527559055118,
1894
+ "grad_norm": 18.67085075378418,
1895
+ "learning_rate": 2.5049701789264414e-07,
1896
+ "loss": 3.295,
1897
+ "step": 257
1898
+ },
1899
+ {
1900
+ "epoch": 0.25393700787401574,
1901
+ "grad_norm": 12.794750213623047,
1902
+ "learning_rate": 2.5149105367793245e-07,
1903
+ "loss": 2.5896,
1904
+ "step": 258
1905
+ },
1906
+ {
1907
+ "epoch": 0.2549212598425197,
1908
+ "grad_norm": 13.11160659790039,
1909
+ "learning_rate": 2.524850894632207e-07,
1910
+ "loss": 2.1215,
1911
+ "step": 259
1912
+ },
1913
+ {
1914
+ "epoch": 0.2559055118110236,
1915
+ "grad_norm": 82.75247192382812,
1916
+ "learning_rate": 2.53479125248509e-07,
1917
+ "loss": 9.4851,
1918
+ "step": 260
1919
+ },
1920
+ {
1921
+ "epoch": 0.25688976377952755,
1922
+ "grad_norm": 10.959670066833496,
1923
+ "learning_rate": 2.5447316103379726e-07,
1924
+ "loss": 2.1982,
1925
+ "step": 261
1926
+ },
1927
+ {
1928
+ "epoch": 0.2578740157480315,
1929
+ "grad_norm": 35.51136016845703,
1930
+ "learning_rate": 2.554671968190855e-07,
1931
+ "loss": 3.0568,
1932
+ "step": 262
1933
+ },
1934
+ {
1935
+ "epoch": 0.2588582677165354,
1936
+ "grad_norm": 11.970784187316895,
1937
+ "learning_rate": 2.564612326043738e-07,
1938
+ "loss": 2.6269,
1939
+ "step": 263
1940
+ },
1941
+ {
1942
+ "epoch": 0.25984251968503935,
1943
+ "grad_norm": 13.664376258850098,
1944
+ "learning_rate": 2.5745526838966207e-07,
1945
+ "loss": 2.4792,
1946
+ "step": 264
1947
+ },
1948
+ {
1949
+ "epoch": 0.2608267716535433,
1950
+ "grad_norm": 10.64925479888916,
1951
+ "learning_rate": 2.5844930417495037e-07,
1952
+ "loss": 1.9445,
1953
+ "step": 265
1954
+ },
1955
+ {
1956
+ "epoch": 0.2618110236220472,
1957
+ "grad_norm": 13.321276664733887,
1958
+ "learning_rate": 2.5944333996023857e-07,
1959
+ "loss": 2.4061,
1960
+ "step": 266
1961
+ },
1962
+ {
1963
+ "epoch": 0.26279527559055116,
1964
+ "grad_norm": 77.70325469970703,
1965
+ "learning_rate": 2.604373757455269e-07,
1966
+ "loss": 8.3116,
1967
+ "step": 267
1968
+ },
1969
+ {
1970
+ "epoch": 0.2637795275590551,
1971
+ "grad_norm": 53.12438201904297,
1972
+ "learning_rate": 2.614314115308152e-07,
1973
+ "loss": 8.0804,
1974
+ "step": 268
1975
+ },
1976
+ {
1977
+ "epoch": 0.26476377952755903,
1978
+ "grad_norm": 10.435575485229492,
1979
+ "learning_rate": 2.6242544731610343e-07,
1980
+ "loss": 2.1674,
1981
+ "step": 269
1982
+ },
1983
+ {
1984
+ "epoch": 0.265748031496063,
1985
+ "grad_norm": 51.96613311767578,
1986
+ "learning_rate": 2.634194831013917e-07,
1987
+ "loss": 7.1975,
1988
+ "step": 270
1989
+ },
1990
+ {
1991
+ "epoch": 0.26673228346456695,
1992
+ "grad_norm": 46.066497802734375,
1993
+ "learning_rate": 2.6441351888667994e-07,
1994
+ "loss": 5.9104,
1995
+ "step": 271
1996
+ },
1997
+ {
1998
+ "epoch": 0.2677165354330709,
1999
+ "grad_norm": 11.553542137145996,
2000
+ "learning_rate": 2.6540755467196824e-07,
2001
+ "loss": 2.498,
2002
+ "step": 272
2003
+ },
2004
+ {
2005
+ "epoch": 0.2687007874015748,
2006
+ "grad_norm": 12.689590454101562,
2007
+ "learning_rate": 2.664015904572565e-07,
2008
+ "loss": 2.5249,
2009
+ "step": 273
2010
+ },
2011
+ {
2012
+ "epoch": 0.26968503937007876,
2013
+ "grad_norm": 11.272449493408203,
2014
+ "learning_rate": 2.6739562624254475e-07,
2015
+ "loss": 2.7152,
2016
+ "step": 274
2017
+ },
2018
+ {
2019
+ "epoch": 0.2706692913385827,
2020
+ "grad_norm": 11.380216598510742,
2021
+ "learning_rate": 2.6838966202783305e-07,
2022
+ "loss": 2.7904,
2023
+ "step": 275
2024
+ },
2025
+ {
2026
+ "epoch": 0.27165354330708663,
2027
+ "grad_norm": 13.75843334197998,
2028
+ "learning_rate": 2.693836978131213e-07,
2029
+ "loss": 2.7745,
2030
+ "step": 276
2031
+ },
2032
+ {
2033
+ "epoch": 0.27263779527559057,
2034
+ "grad_norm": 13.37204647064209,
2035
+ "learning_rate": 2.703777335984096e-07,
2036
+ "loss": 2.9741,
2037
+ "step": 277
2038
+ },
2039
+ {
2040
+ "epoch": 0.2736220472440945,
2041
+ "grad_norm": 11.115955352783203,
2042
+ "learning_rate": 2.7137176938369786e-07,
2043
+ "loss": 1.8215,
2044
+ "step": 278
2045
+ },
2046
+ {
2047
+ "epoch": 0.27460629921259844,
2048
+ "grad_norm": 26.816680908203125,
2049
+ "learning_rate": 2.723658051689861e-07,
2050
+ "loss": 4.6844,
2051
+ "step": 279
2052
+ },
2053
+ {
2054
+ "epoch": 0.2755905511811024,
2055
+ "grad_norm": 12.469667434692383,
2056
+ "learning_rate": 2.7335984095427437e-07,
2057
+ "loss": 2.8613,
2058
+ "step": 280
2059
+ },
2060
+ {
2061
+ "epoch": 0.2765748031496063,
2062
+ "grad_norm": 12.972196578979492,
2063
+ "learning_rate": 2.743538767395627e-07,
2064
+ "loss": 2.7147,
2065
+ "step": 281
2066
+ },
2067
+ {
2068
+ "epoch": 0.27755905511811024,
2069
+ "grad_norm": 12.858671188354492,
2070
+ "learning_rate": 2.75347912524851e-07,
2071
+ "loss": 2.814,
2072
+ "step": 282
2073
+ },
2074
+ {
2075
+ "epoch": 0.2785433070866142,
2076
+ "grad_norm": 10.301759719848633,
2077
+ "learning_rate": 2.763419483101392e-07,
2078
+ "loss": 2.3569,
2079
+ "step": 283
2080
+ },
2081
+ {
2082
+ "epoch": 0.2795275590551181,
2083
+ "grad_norm": 12.948614120483398,
2084
+ "learning_rate": 2.773359840954275e-07,
2085
+ "loss": 2.672,
2086
+ "step": 284
2087
+ },
2088
+ {
2089
+ "epoch": 0.28051181102362205,
2090
+ "grad_norm": 16.839580535888672,
2091
+ "learning_rate": 2.7833001988071574e-07,
2092
+ "loss": 3.2052,
2093
+ "step": 285
2094
+ },
2095
+ {
2096
+ "epoch": 0.281496062992126,
2097
+ "grad_norm": 12.991905212402344,
2098
+ "learning_rate": 2.7932405566600404e-07,
2099
+ "loss": 2.8056,
2100
+ "step": 286
2101
+ },
2102
+ {
2103
+ "epoch": 0.2824803149606299,
2104
+ "grad_norm": 12.984047889709473,
2105
+ "learning_rate": 2.803180914512923e-07,
2106
+ "loss": 2.6268,
2107
+ "step": 287
2108
+ },
2109
+ {
2110
+ "epoch": 0.28346456692913385,
2111
+ "grad_norm": 12.771299362182617,
2112
+ "learning_rate": 2.8131212723658055e-07,
2113
+ "loss": 2.5641,
2114
+ "step": 288
2115
+ },
2116
+ {
2117
+ "epoch": 0.2844488188976378,
2118
+ "grad_norm": 12.763790130615234,
2119
+ "learning_rate": 2.8230616302186885e-07,
2120
+ "loss": 2.4475,
2121
+ "step": 289
2122
+ },
2123
+ {
2124
+ "epoch": 0.2854330708661417,
2125
+ "grad_norm": 12.817102432250977,
2126
+ "learning_rate": 2.833001988071571e-07,
2127
+ "loss": 2.7377,
2128
+ "step": 290
2129
+ },
2130
+ {
2131
+ "epoch": 0.28641732283464566,
2132
+ "grad_norm": 11.62403678894043,
2133
+ "learning_rate": 2.842942345924454e-07,
2134
+ "loss": 2.3831,
2135
+ "step": 291
2136
+ },
2137
+ {
2138
+ "epoch": 0.2874015748031496,
2139
+ "grad_norm": 88.97967529296875,
2140
+ "learning_rate": 2.852882703777336e-07,
2141
+ "loss": 8.8069,
2142
+ "step": 292
2143
+ },
2144
+ {
2145
+ "epoch": 0.28838582677165353,
2146
+ "grad_norm": 12.380749702453613,
2147
+ "learning_rate": 2.862823061630219e-07,
2148
+ "loss": 2.186,
2149
+ "step": 293
2150
+ },
2151
+ {
2152
+ "epoch": 0.28937007874015747,
2153
+ "grad_norm": 12.181745529174805,
2154
+ "learning_rate": 2.8727634194831017e-07,
2155
+ "loss": 2.3389,
2156
+ "step": 294
2157
+ },
2158
+ {
2159
+ "epoch": 0.2903543307086614,
2160
+ "grad_norm": 11.23538875579834,
2161
+ "learning_rate": 2.8827037773359847e-07,
2162
+ "loss": 1.9744,
2163
+ "step": 295
2164
+ },
2165
+ {
2166
+ "epoch": 0.29133858267716534,
2167
+ "grad_norm": 13.454959869384766,
2168
+ "learning_rate": 2.892644135188867e-07,
2169
+ "loss": 2.4491,
2170
+ "step": 296
2171
+ },
2172
+ {
2173
+ "epoch": 0.29232283464566927,
2174
+ "grad_norm": 12.226387977600098,
2175
+ "learning_rate": 2.90258449304175e-07,
2176
+ "loss": 2.5668,
2177
+ "step": 297
2178
+ },
2179
+ {
2180
+ "epoch": 0.2933070866141732,
2181
+ "grad_norm": 13.08324146270752,
2182
+ "learning_rate": 2.912524850894633e-07,
2183
+ "loss": 2.1939,
2184
+ "step": 298
2185
+ },
2186
+ {
2187
+ "epoch": 0.29429133858267714,
2188
+ "grad_norm": 12.1226224899292,
2189
+ "learning_rate": 2.9224652087475153e-07,
2190
+ "loss": 2.2832,
2191
+ "step": 299
2192
+ },
2193
+ {
2194
+ "epoch": 0.2952755905511811,
2195
+ "grad_norm": 12.738725662231445,
2196
+ "learning_rate": 2.9324055666003984e-07,
2197
+ "loss": 2.7508,
2198
+ "step": 300
2199
+ },
2200
+ {
2201
+ "epoch": 0.296259842519685,
2202
+ "grad_norm": 13.919729232788086,
2203
+ "learning_rate": 2.9423459244532804e-07,
2204
+ "loss": 2.5206,
2205
+ "step": 301
2206
+ },
2207
+ {
2208
+ "epoch": 0.297244094488189,
2209
+ "grad_norm": 13.623347282409668,
2210
+ "learning_rate": 2.9522862823061634e-07,
2211
+ "loss": 2.3522,
2212
+ "step": 302
2213
+ },
2214
+ {
2215
+ "epoch": 0.29822834645669294,
2216
+ "grad_norm": 15.347495079040527,
2217
+ "learning_rate": 2.9622266401590465e-07,
2218
+ "loss": 2.7186,
2219
+ "step": 303
2220
+ },
2221
+ {
2222
+ "epoch": 0.2992125984251969,
2223
+ "grad_norm": 13.009486198425293,
2224
+ "learning_rate": 2.972166998011929e-07,
2225
+ "loss": 2.1369,
2226
+ "step": 304
2227
+ },
2228
+ {
2229
+ "epoch": 0.3001968503937008,
2230
+ "grad_norm": 77.14462280273438,
2231
+ "learning_rate": 2.9821073558648115e-07,
2232
+ "loss": 9.7972,
2233
+ "step": 305
2234
+ }
2235
+ ],
2236
+ "logging_steps": 1,
2237
+ "max_steps": 3048,
2238
+ "num_input_tokens_seen": 0,
2239
+ "num_train_epochs": 3,
2240
+ "save_steps": 305,
2241
+ "stateful_callbacks": {
2242
+ "TrainerControl": {
2243
+ "args": {
2244
+ "should_epoch_stop": false,
2245
+ "should_evaluate": false,
2246
+ "should_log": false,
2247
+ "should_save": true,
2248
+ "should_training_stop": false
2249
+ },
2250
+ "attributes": {}
2251
+ }
2252
+ },
2253
+ "total_flos": 0.0,
2254
+ "train_batch_size": 32,
2255
+ "trial_name": null,
2256
+ "trial_params": null
2257
+ }
checkpoint-305/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0fdb17fe2085f73bdf532d3d0579a9cf6fc49d8c98add89223c5b3cc0f4b11cf
3
+ size 5624