ndpphuong commited on
Commit
907ec7a
1 Parent(s): 3765b4e

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,544 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: []
3
+ library_name: sentence-transformers
4
+ tags:
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - generated_from_trainer
9
+ - dataset_size:362208
10
+ - loss:ContrastiveLoss
11
+ base_model: vinai/phobert-base-v2
12
+ datasets: []
13
+ widget:
14
+ - source_sentence: vùng khí_hậu nóng ẩm ghi_nhận sự tồn_tại của Virus SARS-CoV -2
15
+ .
16
+ sentences:
17
+ - vùng khí_hậu nóng ẩm ghi_nhận sự tồn_tại của Virus SARS-CoV -2 .
18
+ - Chế_độ ăn_uống nghỉ_ngơi như_thế_nào là hợp_lý ?
19
+ - Và bé đang có tình_trạng bị đọng cặn nước_tiểu ở đầu bộ sinh_dục ( bé trai ) .
20
+ - source_sentence: Khoan xương và chèn vào hai bên hai bong_bóng được gọi là đệm xương
21
+ .
22
+ sentences:
23
+ - Khoan xương và chèn vào hai bên hai bong_bóng được gọi là đệm xương .
24
+ - 3 hôm_nay bé đi phân chủ_yếu là nhầy có bọt rồi nước lẫn hoa_cà_hoa_cải , thi_thoảng
25
+ phân có màu xanh đậm .
26
+ - Sau khi chẩn_đoán , bác_sĩ khuyên anh Khánh nên sớm điều_trị kịp_thời để tránh
27
+ nhiều biến_chứng .
28
+ - source_sentence: Những phụ_nữ có tiền_sử bệnh như trên chính là những đối_tượng
29
+ nguy_cơ của tình_trạng tắc ống dẫn trứng .
30
+ sentences:
31
+ - 'Dùng các thiết_bị hỗ_trợ quá_trình di_chuyển đồng_thời giúp cải_thiện chức_năng
32
+ của các khớp như :'
33
+ - Những phụ_nữ có tiền_sử bệnh như trên chính là những đối_tượng nguy_cơ của tình_trạng
34
+ tắc ống dẫn trứng .
35
+ - Trong các nguyên_nhân sau đây , đâu là các nguyên_nhân khách_quan , không đến
36
+ từ mẹ và thai_nhi ?
37
+ - source_sentence: Bé nhà con nay được 1 tháng 23 ngày .
38
+ sentences:
39
+ - 'Vú phải : - bất đối_xứng ở vùng dưới ( kích_thước :'
40
+ - Thưa bác_sĩ tôi 18 tuổi , bị sưng chân răng , nhức tai , nhức đầu với có kêu tiếng
41
+ trong quai_hàm .
42
+ - Bé nhà con nay được 1 tháng 23 ngày .
43
+ - source_sentence: Tuy_nhiên , nếu bệnh không tự lành và vẫn tiếp_tục chảy_máu , cần
44
+ phải sử_dụng các liệu_pháp cầm máu để bù lại lượng máu đã mất .
45
+ sentences:
46
+ - Nguyễn_Thị_Thanh_Tuyền ( 1995 ) .
47
+ - Bệnh_viện Bệnh Nhiệt_đới Trung_ương cơ_sở Kim_Chung là bệnh_viện khám_chữa bệnh
48
+ đa_khoa phục_vụ cho người_dân trong cả nước .
49
+ - 'Một_số yếu_tố làm tăng nguy_cơ mắc bệnh như : Yếu_tố nội_tiết : bệnh thường gặp
50
+ ở phụ_nữ chậm có kinh và sớm mãn_kinh .'
51
+ pipeline_tag: sentence-similarity
52
+ ---
53
+
54
+ # SentenceTransformer based on vinai/phobert-base-v2
55
+
56
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [vinai/phobert-base-v2](https://huggingface.co/vinai/phobert-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
57
+
58
+ ## Model Details
59
+
60
+ ### Model Description
61
+ - **Model Type:** Sentence Transformer
62
+ - **Base model:** [vinai/phobert-base-v2](https://huggingface.co/vinai/phobert-base-v2) <!-- at revision 2b51e367d92093c9688112098510e6a58bab67cd -->
63
+ - **Maximum Sequence Length:** 256 tokens
64
+ - **Output Dimensionality:** 768 tokens
65
+ - **Similarity Function:** Cosine Similarity
66
+ <!-- - **Training Dataset:** Unknown -->
67
+ <!-- - **Language:** Unknown -->
68
+ <!-- - **License:** Unknown -->
69
+
70
+ ### Model Sources
71
+
72
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
73
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
74
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
75
+
76
+ ### Full Model Architecture
77
+
78
+ ```
79
+ SentenceTransformer(
80
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: RobertaModel
81
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
82
+ )
83
+ ```
84
+
85
+ ## Usage
86
+
87
+ ### Direct Usage (Sentence Transformers)
88
+
89
+ First install the Sentence Transformers library:
90
+
91
+ ```bash
92
+ pip install -U sentence-transformers
93
+ ```
94
+
95
+ Then you can load this model and run inference.
96
+ ```python
97
+ from sentence_transformers import SentenceTransformer
98
+
99
+ # Download from the 🤗 Hub
100
+ model = SentenceTransformer("sentence_transformers_model_id")
101
+ # Run inference
102
+ sentences = [
103
+ 'Tuy_nhiên , nếu bệnh không tự lành và vẫn tiếp_tục chảy_máu , cần phải sử_dụng các liệu_pháp cầm máu để bù lại lượng máu đã mất .',
104
+ 'Một_số yếu_tố làm tăng nguy_cơ mắc bệnh như : Yếu_tố nội_tiết : bệnh thường gặp ở phụ_nữ chậm có kinh và sớm mãn_kinh .',
105
+ 'Nguyễn_Thị_Thanh_Tuyền ( 1995 ) .',
106
+ ]
107
+ embeddings = model.encode(sentences)
108
+ print(embeddings.shape)
109
+ # [3, 768]
110
+
111
+ # Get the similarity scores for the embeddings
112
+ similarities = model.similarity(embeddings, embeddings)
113
+ print(similarities.shape)
114
+ # [3, 3]
115
+ ```
116
+
117
+ <!--
118
+ ### Direct Usage (Transformers)
119
+
120
+ <details><summary>Click to see the direct usage in Transformers</summary>
121
+
122
+ </details>
123
+ -->
124
+
125
+ <!--
126
+ ### Downstream Usage (Sentence Transformers)
127
+
128
+ You can finetune this model on your own dataset.
129
+
130
+ <details><summary>Click to expand</summary>
131
+
132
+ </details>
133
+ -->
134
+
135
+ <!--
136
+ ### Out-of-Scope Use
137
+
138
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
139
+ -->
140
+
141
+ <!--
142
+ ## Bias, Risks and Limitations
143
+
144
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
145
+ -->
146
+
147
+ <!--
148
+ ### Recommendations
149
+
150
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
151
+ -->
152
+
153
+ ## Training Details
154
+
155
+ ### Training Dataset
156
+
157
+ #### Unnamed Dataset
158
+
159
+
160
+ * Size: 362,208 training samples
161
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
162
+ * Approximate statistics based on the first 1000 samples:
163
+ | | sentence_0 | sentence_1 | label |
164
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
165
+ | type | string | string | float |
166
+ | details | <ul><li>min: 3 tokens</li><li>mean: 22.64 tokens</li><li>max: 104 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 23.25 tokens</li><li>max: 222 tokens</li></ul> | <ul><li>min: 0.1</li><li>mean: 0.82</li><li>max: 1.0</li></ul> |
167
+ * Samples:
168
+ | sentence_0 | sentence_1 | label |
169
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
170
+ | <code>Hiệu_lực của vaccine AstraZeneca ra sao ?</code> | <code>Hiệu_lực của vaccine AstraZeneca ra sao ?</code> | <code>1.0</code> |
171
+ | <code>Gần đây , tôi có quen một bạn gái , mỗi lần ngồi gần nhau có cử_chỉ thân_mật thì tôi gần như không kìm chế được có_thể nói là giống như hiện_tượng xuất_tinh sớm .</code> | <code>Chụp CT scanner sọ não : là hình_ảnh tốt nhất để đánh_giá tổn_thương não vì có_thể hiển_thị mô não hoặc xuất_huyết não hoặc nhũn_não .</code> | <code>0.6540138125419617</code> |
172
+ | <code>Sốt siêu_vi sau quan_hệ tình_dục không an_toàn có phải đã nhiễm HIV không ?</code> | <code>Sốt siêu_vi sau quan_hệ tình_dục không an_toàn có phải đã nhiễm HIV không ?</code> | <code>1.0</code> |
173
+ * Loss: [<code>ContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastiveloss) with these parameters:
174
+ ```json
175
+ {
176
+ "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
177
+ "margin": 0.5,
178
+ "size_average": true
179
+ }
180
+ ```
181
+
182
+ ### Training Hyperparameters
183
+ #### Non-Default Hyperparameters
184
+
185
+ - `per_device_train_batch_size`: 16
186
+ - `per_device_eval_batch_size`: 16
187
+ - `num_train_epochs`: 4
188
+ - `multi_dataset_batch_sampler`: round_robin
189
+
190
+ #### All Hyperparameters
191
+ <details><summary>Click to expand</summary>
192
+
193
+ - `overwrite_output_dir`: False
194
+ - `do_predict`: False
195
+ - `prediction_loss_only`: True
196
+ - `per_device_train_batch_size`: 16
197
+ - `per_device_eval_batch_size`: 16
198
+ - `per_gpu_train_batch_size`: None
199
+ - `per_gpu_eval_batch_size`: None
200
+ - `gradient_accumulation_steps`: 1
201
+ - `eval_accumulation_steps`: None
202
+ - `learning_rate`: 5e-05
203
+ - `weight_decay`: 0.0
204
+ - `adam_beta1`: 0.9
205
+ - `adam_beta2`: 0.999
206
+ - `adam_epsilon`: 1e-08
207
+ - `max_grad_norm`: 1
208
+ - `num_train_epochs`: 4
209
+ - `max_steps`: -1
210
+ - `lr_scheduler_type`: linear
211
+ - `lr_scheduler_kwargs`: {}
212
+ - `warmup_ratio`: 0.0
213
+ - `warmup_steps`: 0
214
+ - `log_level`: passive
215
+ - `log_level_replica`: warning
216
+ - `log_on_each_node`: True
217
+ - `logging_nan_inf_filter`: True
218
+ - `save_safetensors`: True
219
+ - `save_on_each_node`: False
220
+ - `save_only_model`: False
221
+ - `no_cuda`: False
222
+ - `use_cpu`: False
223
+ - `use_mps_device`: False
224
+ - `seed`: 42
225
+ - `data_seed`: None
226
+ - `jit_mode_eval`: False
227
+ - `use_ipex`: False
228
+ - `bf16`: False
229
+ - `fp16`: False
230
+ - `fp16_opt_level`: O1
231
+ - `half_precision_backend`: auto
232
+ - `bf16_full_eval`: False
233
+ - `fp16_full_eval`: False
234
+ - `tf32`: None
235
+ - `local_rank`: 0
236
+ - `ddp_backend`: None
237
+ - `tpu_num_cores`: None
238
+ - `tpu_metrics_debug`: False
239
+ - `debug`: []
240
+ - `dataloader_drop_last`: False
241
+ - `dataloader_num_workers`: 0
242
+ - `dataloader_prefetch_factor`: None
243
+ - `past_index`: -1
244
+ - `disable_tqdm`: False
245
+ - `remove_unused_columns`: True
246
+ - `label_names`: None
247
+ - `load_best_model_at_end`: False
248
+ - `ignore_data_skip`: False
249
+ - `fsdp`: []
250
+ - `fsdp_min_num_params`: 0
251
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
252
+ - `fsdp_transformer_layer_cls_to_wrap`: None
253
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True}
254
+ - `deepspeed`: None
255
+ - `label_smoothing_factor`: 0.0
256
+ - `optim`: adamw_torch
257
+ - `optim_args`: None
258
+ - `adafactor`: False
259
+ - `group_by_length`: False
260
+ - `length_column_name`: length
261
+ - `ddp_find_unused_parameters`: None
262
+ - `ddp_bucket_cap_mb`: None
263
+ - `ddp_broadcast_buffers`: False
264
+ - `dataloader_pin_memory`: True
265
+ - `dataloader_persistent_workers`: False
266
+ - `skip_memory_metrics`: True
267
+ - `use_legacy_prediction_loop`: False
268
+ - `push_to_hub`: False
269
+ - `resume_from_checkpoint`: None
270
+ - `hub_model_id`: None
271
+ - `hub_strategy`: every_save
272
+ - `hub_private_repo`: False
273
+ - `hub_always_push`: False
274
+ - `gradient_checkpointing`: False
275
+ - `gradient_checkpointing_kwargs`: None
276
+ - `include_inputs_for_metrics`: False
277
+ - `fp16_backend`: auto
278
+ - `push_to_hub_model_id`: None
279
+ - `push_to_hub_organization`: None
280
+ - `mp_parameters`:
281
+ - `auto_find_batch_size`: False
282
+ - `full_determinism`: False
283
+ - `torchdynamo`: None
284
+ - `ray_scope`: last
285
+ - `ddp_timeout`: 1800
286
+ - `torch_compile`: False
287
+ - `torch_compile_backend`: None
288
+ - `torch_compile_mode`: None
289
+ - `dispatch_batches`: None
290
+ - `split_batches`: None
291
+ - `include_tokens_per_second`: False
292
+ - `include_num_input_tokens_seen`: False
293
+ - `neftune_noise_alpha`: None
294
+ - `optim_target_modules`: None
295
+ - `batch_sampler`: batch_sampler
296
+ - `multi_dataset_batch_sampler`: round_robin
297
+
298
+ </details>
299
+
300
+ ### Training Logs
301
+ <details><summary>Click to expand</summary>
302
+
303
+ | Epoch | Step | Training Loss |
304
+ |:------:|:-----:|:-------------:|
305
+ | 0.0221 | 500 | 0.0168 |
306
+ | 0.0442 | 1000 | 0.0139 |
307
+ | 0.0663 | 1500 | 0.0142 |
308
+ | 0.0883 | 2000 | 0.0139 |
309
+ | 0.1104 | 2500 | 0.0137 |
310
+ | 0.1325 | 3000 | 0.0139 |
311
+ | 0.1546 | 3500 | 0.0137 |
312
+ | 0.1767 | 4000 | 0.0139 |
313
+ | 0.1988 | 4500 | 0.0136 |
314
+ | 0.2209 | 5000 | 0.0135 |
315
+ | 0.2430 | 5500 | 0.0137 |
316
+ | 0.2650 | 6000 | 0.0138 |
317
+ | 0.2871 | 6500 | 0.0136 |
318
+ | 0.3092 | 7000 | 0.0137 |
319
+ | 0.3313 | 7500 | 0.0138 |
320
+ | 0.3534 | 8000 | 0.0135 |
321
+ | 0.3755 | 8500 | 0.0138 |
322
+ | 0.3976 | 9000 | 0.0138 |
323
+ | 0.4196 | 9500 | 0.0141 |
324
+ | 0.4417 | 10000 | 0.0139 |
325
+ | 0.4638 | 10500 | 0.0139 |
326
+ | 0.4859 | 11000 | 0.0138 |
327
+ | 0.5080 | 11500 | 0.0141 |
328
+ | 0.5301 | 12000 | 0.0138 |
329
+ | 0.5522 | 12500 | 0.0138 |
330
+ | 0.5743 | 13000 | 0.0138 |
331
+ | 0.5963 | 13500 | 0.0138 |
332
+ | 0.6184 | 14000 | 0.0136 |
333
+ | 0.6405 | 14500 | 0.0139 |
334
+ | 0.6626 | 15000 | 0.0151 |
335
+ | 0.6847 | 15500 | 0.019 |
336
+ | 0.7068 | 16000 | 0.0184 |
337
+ | 0.7289 | 16500 | 0.018 |
338
+ | 0.7509 | 17000 | 0.0163 |
339
+ | 0.7730 | 17500 | 0.0164 |
340
+ | 0.7951 | 18000 | 0.0158 |
341
+ | 0.8172 | 18500 | 0.0155 |
342
+ | 0.8393 | 19000 | 0.0151 |
343
+ | 0.8614 | 19500 | 0.0151 |
344
+ | 0.8835 | 20000 | 0.0152 |
345
+ | 0.9056 | 20500 | 0.0152 |
346
+ | 0.9276 | 21000 | 0.0151 |
347
+ | 0.9497 | 21500 | 0.0148 |
348
+ | 0.9718 | 22000 | 0.015 |
349
+ | 0.9939 | 22500 | 0.0147 |
350
+ | 1.0160 | 23000 | 0.0149 |
351
+ | 1.0381 | 23500 | 0.0151 |
352
+ | 1.0602 | 24000 | 0.015 |
353
+ | 1.0823 | 24500 | 0.0148 |
354
+ | 1.1043 | 25000 | 0.0147 |
355
+ | 1.1264 | 25500 | 0.0149 |
356
+ | 1.1485 | 26000 | 0.0147 |
357
+ | 1.1706 | 26500 | 0.015 |
358
+ | 1.1927 | 27000 | 0.0146 |
359
+ | 1.2148 | 27500 | 0.0145 |
360
+ | 1.2369 | 28000 | 0.0147 |
361
+ | 1.2589 | 28500 | 0.0149 |
362
+ | 1.2810 | 29000 | 0.0147 |
363
+ | 1.3031 | 29500 | 0.0144 |
364
+ | 1.3252 | 30000 | 0.0147 |
365
+ | 1.3473 | 30500 | 0.0147 |
366
+ | 1.3694 | 31000 | 0.0145 |
367
+ | 1.3915 | 31500 | 0.0149 |
368
+ | 1.4136 | 32000 | 0.0147 |
369
+ | 1.4356 | 32500 | 0.0148 |
370
+ | 1.4577 | 33000 | 0.0148 |
371
+ | 1.4798 | 33500 | 0.0145 |
372
+ | 1.5019 | 34000 | 0.0149 |
373
+ | 1.5240 | 34500 | 0.0147 |
374
+ | 1.5461 | 35000 | 0.0146 |
375
+ | 1.5682 | 35500 | 0.0144 |
376
+ | 1.5902 | 36000 | 0.0146 |
377
+ | 1.6123 | 36500 | 0.0143 |
378
+ | 1.6344 | 37000 | 0.0145 |
379
+ | 1.6565 | 37500 | 0.0145 |
380
+ | 1.6786 | 38000 | 0.0146 |
381
+ | 1.7007 | 38500 | 0.0143 |
382
+ | 1.7228 | 39000 | 0.0149 |
383
+ | 1.7449 | 39500 | 0.0143 |
384
+ | 1.7669 | 40000 | 0.0146 |
385
+ | 1.7890 | 40500 | 0.0146 |
386
+ | 1.8111 | 41000 | 0.0146 |
387
+ | 1.8332 | 41500 | 0.0142 |
388
+ | 1.8553 | 42000 | 0.0144 |
389
+ | 1.8774 | 42500 | 0.0146 |
390
+ | 1.8995 | 43000 | 0.0147 |
391
+ | 1.9215 | 43500 | 0.0144 |
392
+ | 1.9436 | 44000 | 0.0145 |
393
+ | 1.9657 | 44500 | 0.0143 |
394
+ | 1.9878 | 45000 | 0.0146 |
395
+ | 2.0099 | 45500 | 0.0143 |
396
+ | 2.0320 | 46000 | 0.0147 |
397
+ | 2.0541 | 46500 | 0.0146 |
398
+ | 2.0762 | 47000 | 0.0144 |
399
+ | 2.0982 | 47500 | 0.0144 |
400
+ | 2.1203 | 48000 | 0.0144 |
401
+ | 2.1424 | 48500 | 0.0145 |
402
+ | 2.1645 | 49000 | 0.0144 |
403
+ | 2.1866 | 49500 | 0.0144 |
404
+ | 2.2087 | 50000 | 0.0141 |
405
+ | 2.2308 | 50500 | 0.0142 |
406
+ | 2.2528 | 51000 | 0.0145 |
407
+ | 2.2749 | 51500 | 0.0143 |
408
+ | 2.2970 | 52000 | 0.0141 |
409
+ | 2.3191 | 52500 | 0.0144 |
410
+ | 2.3412 | 53000 | 0.0143 |
411
+ | 2.3633 | 53500 | 0.0144 |
412
+ | 2.3854 | 54000 | 0.0144 |
413
+ | 2.4075 | 54500 | 0.0144 |
414
+ | 2.4295 | 55000 | 0.0145 |
415
+ | 2.4516 | 55500 | 0.0145 |
416
+ | 2.4737 | 56000 | 0.0144 |
417
+ | 2.4958 | 56500 | 0.0147 |
418
+ | 2.5179 | 57000 | 0.0145 |
419
+ | 2.5400 | 57500 | 0.0144 |
420
+ | 2.5621 | 58000 | 0.0143 |
421
+ | 2.5842 | 58500 | 0.0144 |
422
+ | 2.6062 | 59000 | 0.0143 |
423
+ | 2.6283 | 59500 | 0.0142 |
424
+ | 2.6504 | 60000 | 0.0143 |
425
+ | 2.6725 | 60500 | 0.0143 |
426
+ | 2.6946 | 61000 | 0.0143 |
427
+ | 2.7167 | 61500 | 0.0144 |
428
+ | 2.7388 | 62000 | 0.0143 |
429
+ | 2.7608 | 62500 | 0.0143 |
430
+ | 2.7829 | 63000 | 0.0146 |
431
+ | 2.8050 | 63500 | 0.0144 |
432
+ | 2.8271 | 64000 | 0.0141 |
433
+ | 2.8492 | 64500 | 0.0142 |
434
+ | 2.8713 | 65000 | 0.0143 |
435
+ | 2.8934 | 65500 | 0.0146 |
436
+ | 2.9155 | 66000 | 0.0143 |
437
+ | 2.9375 | 66500 | 0.0143 |
438
+ | 2.9596 | 67000 | 0.0141 |
439
+ | 2.9817 | 67500 | 0.0144 |
440
+ | 3.0038 | 68000 | 0.0143 |
441
+ | 3.0259 | 68500 | 0.0145 |
442
+ | 3.0480 | 69000 | 0.0142 |
443
+ | 3.0701 | 69500 | 0.0145 |
444
+ | 3.0921 | 70000 | 0.0142 |
445
+ | 3.1142 | 70500 | 0.0143 |
446
+ | 3.1363 | 71000 | 0.0142 |
447
+ | 3.1584 | 71500 | 0.0143 |
448
+ | 3.1805 | 72000 | 0.0143 |
449
+ | 3.2026 | 72500 | 0.014 |
450
+ | 3.2247 | 73000 | 0.0141 |
451
+ | 3.2468 | 73500 | 0.0142 |
452
+ | 3.2688 | 74000 | 0.0143 |
453
+ | 3.2909 | 74500 | 0.0141 |
454
+ | 3.3130 | 75000 | 0.0141 |
455
+ | 3.3351 | 75500 | 0.0143 |
456
+ | 3.3572 | 76000 | 0.0141 |
457
+ | 3.3793 | 76500 | 0.0143 |
458
+ | 3.4014 | 77000 | 0.0143 |
459
+ | 3.4234 | 77500 | 0.0146 |
460
+ | 3.4455 | 78000 | 0.0144 |
461
+ | 3.4676 | 78500 | 0.0143 |
462
+ | 3.4897 | 79000 | 0.0144 |
463
+ | 3.5118 | 79500 | 0.0145 |
464
+ | 3.5339 | 80000 | 0.0142 |
465
+ | 3.5560 | 80500 | 0.0144 |
466
+ | 3.5781 | 81000 | 0.0143 |
467
+ | 3.6001 | 81500 | 0.0142 |
468
+ | 3.6222 | 82000 | 0.0142 |
469
+ | 3.6443 | 82500 | 0.0142 |
470
+ | 3.6664 | 83000 | 0.014 |
471
+ | 3.6885 | 83500 | 0.0144 |
472
+ | 3.7106 | 84000 | 0.0141 |
473
+ | 3.7327 | 84500 | 0.0143 |
474
+ | 3.7547 | 85000 | 0.014 |
475
+ | 3.7768 | 85500 | 0.0146 |
476
+ | 3.7989 | 86000 | 0.0143 |
477
+ | 3.8210 | 86500 | 0.0142 |
478
+ | 3.8431 | 87000 | 0.0139 |
479
+ | 3.8652 | 87500 | 0.0143 |
480
+ | 3.8873 | 88000 | 0.0144 |
481
+ | 3.9094 | 88500 | 0.0143 |
482
+ | 3.9314 | 89000 | 0.0142 |
483
+ | 3.9535 | 89500 | 0.0142 |
484
+ | 3.9756 | 90000 | 0.0142 |
485
+
486
+ </details>
487
+
488
+ ### Framework Versions
489
+ - Python: 3.10.13
490
+ - Sentence Transformers: 3.1.0.dev0
491
+ - Transformers: 4.39.3
492
+ - PyTorch: 2.1.2
493
+ - Accelerate: 0.29.3
494
+ - Datasets: 2.18.0
495
+ - Tokenizers: 0.15.2
496
+
497
+ ## Citation
498
+
499
+ ### BibTeX
500
+
501
+ #### Sentence Transformers
502
+ ```bibtex
503
+ @inproceedings{reimers-2019-sentence-bert,
504
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
505
+ author = "Reimers, Nils and Gurevych, Iryna",
506
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
507
+ month = "11",
508
+ year = "2019",
509
+ publisher = "Association for Computational Linguistics",
510
+ url = "https://arxiv.org/abs/1908.10084",
511
+ }
512
+ ```
513
+
514
+ #### ContrastiveLoss
515
+ ```bibtex
516
+ @inproceedings{hadsell2006dimensionality,
517
+ author={Hadsell, R. and Chopra, S. and LeCun, Y.},
518
+ booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
519
+ title={Dimensionality Reduction by Learning an Invariant Mapping},
520
+ year={2006},
521
+ volume={2},
522
+ number={},
523
+ pages={1735-1742},
524
+ doi={10.1109/CVPR.2006.100}
525
+ }
526
+ ```
527
+
528
+ <!--
529
+ ## Glossary
530
+
531
+ *Clearly define terms in order to be accessible across audiences.*
532
+ -->
533
+
534
+ <!--
535
+ ## Model Card Authors
536
+
537
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
538
+ -->
539
+
540
+ <!--
541
+ ## Model Card Contact
542
+
543
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
544
+ -->
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<mask>": 64000
3
+ }
bpe.codes ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "vinai/phobert-base-v2",
3
+ "architectures": [
4
+ "RobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 258,
17
+ "model_type": "roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "pad_token_id": 1,
21
+ "position_embedding_type": "absolute",
22
+ "tokenizer_class": "PhobertTokenizer",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.39.3",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 64001
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.0.dev0",
4
+ "transformers": "4.39.3",
5
+ "pytorch": "2.1.2"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9d4a6db7578a9c2f29c471ecba2540b8522a455faff430999a8dadf897826b8
3
+ size 540015464
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3d341f133c19d9a676f147da1a70460996cc9db7d9616a4b897900a4a508c2d
3
+ size 1080152634
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0406bf623d5d04576a8b9be491c9c293d43e339baa4b502fdf4e9218b981bc8
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf1f98f00221cad9904202f2abd85131a25f85b16287bb2dab3e092ee2ace761
3
+ size 1064
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": "<mask>",
6
+ "pad_token": "<pad>",
7
+ "sep_token": "</s>",
8
+ "unk_token": "<unk>"
9
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "64000": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 256,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "PhobertTokenizer",
53
+ "unk_token": "<unk>"
54
+ }
trainer_state.json ADDED
@@ -0,0 +1,1281 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 3.97561622051418,
5
+ "eval_steps": 0,
6
+ "global_step": 90000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02,
13
+ "grad_norm": 0.010943206027150154,
14
+ "learning_rate": 1.6563604240282684e-06,
15
+ "loss": 0.0168,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.04,
20
+ "grad_norm": 0.010196206159889698,
21
+ "learning_rate": 3.312720848056537e-06,
22
+ "loss": 0.0139,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.07,
27
+ "grad_norm": 0.009697528555989265,
28
+ "learning_rate": 4.969081272084806e-06,
29
+ "loss": 0.0142,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.09,
34
+ "grad_norm": 0.011516646482050419,
35
+ "learning_rate": 6.625441696113074e-06,
36
+ "loss": 0.0139,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.11,
41
+ "grad_norm": 0.01910693757236004,
42
+ "learning_rate": 8.281802120141344e-06,
43
+ "loss": 0.0137,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 0.13,
48
+ "grad_norm": 0.004923399072140455,
49
+ "learning_rate": 9.938162544169612e-06,
50
+ "loss": 0.0139,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 0.15,
55
+ "grad_norm": 0.007313380483537912,
56
+ "learning_rate": 1.159452296819788e-05,
57
+ "loss": 0.0137,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 0.18,
62
+ "grad_norm": 0.003835548646748066,
63
+ "learning_rate": 1.3250883392226147e-05,
64
+ "loss": 0.0139,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 0.2,
69
+ "grad_norm": 0.007223702035844326,
70
+ "learning_rate": 1.4907243816254417e-05,
71
+ "loss": 0.0136,
72
+ "step": 4500
73
+ },
74
+ {
75
+ "epoch": 0.22,
76
+ "grad_norm": 0.010532427579164505,
77
+ "learning_rate": 1.6563604240282687e-05,
78
+ "loss": 0.0135,
79
+ "step": 5000
80
+ },
81
+ {
82
+ "epoch": 0.24,
83
+ "grad_norm": 0.0022224171552807093,
84
+ "learning_rate": 1.8219964664310956e-05,
85
+ "loss": 0.0137,
86
+ "step": 5500
87
+ },
88
+ {
89
+ "epoch": 0.27,
90
+ "grad_norm": 0.0046217963099479675,
91
+ "learning_rate": 1.9876325088339224e-05,
92
+ "loss": 0.0138,
93
+ "step": 6000
94
+ },
95
+ {
96
+ "epoch": 0.29,
97
+ "grad_norm": 0.007814540527760983,
98
+ "learning_rate": 2.1532685512367493e-05,
99
+ "loss": 0.0136,
100
+ "step": 6500
101
+ },
102
+ {
103
+ "epoch": 0.31,
104
+ "grad_norm": 0.011144978925585747,
105
+ "learning_rate": 2.318904593639576e-05,
106
+ "loss": 0.0137,
107
+ "step": 7000
108
+ },
109
+ {
110
+ "epoch": 0.33,
111
+ "grad_norm": 0.0014108135364949703,
112
+ "learning_rate": 2.484540636042403e-05,
113
+ "loss": 0.0138,
114
+ "step": 7500
115
+ },
116
+ {
117
+ "epoch": 0.35,
118
+ "grad_norm": 0.0039382693357765675,
119
+ "learning_rate": 2.6501766784452294e-05,
120
+ "loss": 0.0135,
121
+ "step": 8000
122
+ },
123
+ {
124
+ "epoch": 0.38,
125
+ "grad_norm": 0.014407187700271606,
126
+ "learning_rate": 2.8158127208480566e-05,
127
+ "loss": 0.0138,
128
+ "step": 8500
129
+ },
130
+ {
131
+ "epoch": 0.4,
132
+ "grad_norm": 0.003938812296837568,
133
+ "learning_rate": 2.9814487632508834e-05,
134
+ "loss": 0.0138,
135
+ "step": 9000
136
+ },
137
+ {
138
+ "epoch": 0.42,
139
+ "grad_norm": 0.0075125317089259624,
140
+ "learning_rate": 2.983655639540591e-05,
141
+ "loss": 0.0141,
142
+ "step": 9500
143
+ },
144
+ {
145
+ "epoch": 0.44,
146
+ "grad_norm": 0.004186388570815325,
147
+ "learning_rate": 2.9652498282124278e-05,
148
+ "loss": 0.0139,
149
+ "step": 10000
150
+ },
151
+ {
152
+ "epoch": 0.46,
153
+ "grad_norm": 0.004882327280938625,
154
+ "learning_rate": 2.9468440168842645e-05,
155
+ "loss": 0.0139,
156
+ "step": 10500
157
+ },
158
+ {
159
+ "epoch": 0.49,
160
+ "grad_norm": 0.002360760699957609,
161
+ "learning_rate": 2.9284382055561012e-05,
162
+ "loss": 0.0138,
163
+ "step": 11000
164
+ },
165
+ {
166
+ "epoch": 0.51,
167
+ "grad_norm": 0.003932056948542595,
168
+ "learning_rate": 2.910032394227938e-05,
169
+ "loss": 0.0141,
170
+ "step": 11500
171
+ },
172
+ {
173
+ "epoch": 0.53,
174
+ "grad_norm": 0.015781084075570107,
175
+ "learning_rate": 2.8916265828997743e-05,
176
+ "loss": 0.0138,
177
+ "step": 12000
178
+ },
179
+ {
180
+ "epoch": 0.55,
181
+ "grad_norm": 0.002987222746014595,
182
+ "learning_rate": 2.873220771571611e-05,
183
+ "loss": 0.0138,
184
+ "step": 12500
185
+ },
186
+ {
187
+ "epoch": 0.57,
188
+ "grad_norm": 0.002478894544765353,
189
+ "learning_rate": 2.8548149602434473e-05,
190
+ "loss": 0.0138,
191
+ "step": 13000
192
+ },
193
+ {
194
+ "epoch": 0.6,
195
+ "grad_norm": 0.00281110149808228,
196
+ "learning_rate": 2.836409148915284e-05,
197
+ "loss": 0.0138,
198
+ "step": 13500
199
+ },
200
+ {
201
+ "epoch": 0.62,
202
+ "grad_norm": 0.0016416048165410757,
203
+ "learning_rate": 2.8180033375871207e-05,
204
+ "loss": 0.0136,
205
+ "step": 14000
206
+ },
207
+ {
208
+ "epoch": 0.64,
209
+ "grad_norm": 0.003159256186336279,
210
+ "learning_rate": 2.7995975262589574e-05,
211
+ "loss": 0.0139,
212
+ "step": 14500
213
+ },
214
+ {
215
+ "epoch": 0.66,
216
+ "grad_norm": 0.03868310526013374,
217
+ "learning_rate": 2.781191714930794e-05,
218
+ "loss": 0.0151,
219
+ "step": 15000
220
+ },
221
+ {
222
+ "epoch": 0.68,
223
+ "grad_norm": 0.026466334238648415,
224
+ "learning_rate": 2.7627859036026308e-05,
225
+ "loss": 0.019,
226
+ "step": 15500
227
+ },
228
+ {
229
+ "epoch": 0.71,
230
+ "grad_norm": 0.014158655889332294,
231
+ "learning_rate": 2.7443800922744675e-05,
232
+ "loss": 0.0184,
233
+ "step": 16000
234
+ },
235
+ {
236
+ "epoch": 0.73,
237
+ "grad_norm": 0.0531819723546505,
238
+ "learning_rate": 2.7259742809463042e-05,
239
+ "loss": 0.018,
240
+ "step": 16500
241
+ },
242
+ {
243
+ "epoch": 0.75,
244
+ "grad_norm": 0.031152892857789993,
245
+ "learning_rate": 2.707568469618141e-05,
246
+ "loss": 0.0163,
247
+ "step": 17000
248
+ },
249
+ {
250
+ "epoch": 0.77,
251
+ "grad_norm": 0.045258529484272,
252
+ "learning_rate": 2.6891626582899776e-05,
253
+ "loss": 0.0164,
254
+ "step": 17500
255
+ },
256
+ {
257
+ "epoch": 0.8,
258
+ "grad_norm": 0.08072955161333084,
259
+ "learning_rate": 2.6707568469618143e-05,
260
+ "loss": 0.0158,
261
+ "step": 18000
262
+ },
263
+ {
264
+ "epoch": 0.82,
265
+ "grad_norm": 0.06747995316982269,
266
+ "learning_rate": 2.652351035633651e-05,
267
+ "loss": 0.0155,
268
+ "step": 18500
269
+ },
270
+ {
271
+ "epoch": 0.84,
272
+ "grad_norm": 0.029322072863578796,
273
+ "learning_rate": 2.6339452243054877e-05,
274
+ "loss": 0.0151,
275
+ "step": 19000
276
+ },
277
+ {
278
+ "epoch": 0.86,
279
+ "grad_norm": 0.03149860352277756,
280
+ "learning_rate": 2.615539412977324e-05,
281
+ "loss": 0.0151,
282
+ "step": 19500
283
+ },
284
+ {
285
+ "epoch": 0.88,
286
+ "grad_norm": 0.03845517709851265,
287
+ "learning_rate": 2.5971336016491608e-05,
288
+ "loss": 0.0152,
289
+ "step": 20000
290
+ },
291
+ {
292
+ "epoch": 0.91,
293
+ "grad_norm": 0.02114521712064743,
294
+ "learning_rate": 2.578727790320997e-05,
295
+ "loss": 0.0152,
296
+ "step": 20500
297
+ },
298
+ {
299
+ "epoch": 0.93,
300
+ "grad_norm": 0.04469970241189003,
301
+ "learning_rate": 2.560321978992834e-05,
302
+ "loss": 0.0151,
303
+ "step": 21000
304
+ },
305
+ {
306
+ "epoch": 0.95,
307
+ "grad_norm": 0.03439483791589737,
308
+ "learning_rate": 2.5419161676646705e-05,
309
+ "loss": 0.0148,
310
+ "step": 21500
311
+ },
312
+ {
313
+ "epoch": 0.97,
314
+ "grad_norm": 0.016261784359812737,
315
+ "learning_rate": 2.5235103563365072e-05,
316
+ "loss": 0.015,
317
+ "step": 22000
318
+ },
319
+ {
320
+ "epoch": 0.99,
321
+ "grad_norm": 0.04864068329334259,
322
+ "learning_rate": 2.505104545008344e-05,
323
+ "loss": 0.0147,
324
+ "step": 22500
325
+ },
326
+ {
327
+ "epoch": 1.02,
328
+ "grad_norm": 0.024570118635892868,
329
+ "learning_rate": 2.4866987336801806e-05,
330
+ "loss": 0.0149,
331
+ "step": 23000
332
+ },
333
+ {
334
+ "epoch": 1.04,
335
+ "grad_norm": 0.015043354593217373,
336
+ "learning_rate": 2.4682929223520173e-05,
337
+ "loss": 0.0151,
338
+ "step": 23500
339
+ },
340
+ {
341
+ "epoch": 1.06,
342
+ "grad_norm": 0.038648445159196854,
343
+ "learning_rate": 2.449887111023854e-05,
344
+ "loss": 0.015,
345
+ "step": 24000
346
+ },
347
+ {
348
+ "epoch": 1.08,
349
+ "grad_norm": 0.2623123824596405,
350
+ "learning_rate": 2.4314812996956907e-05,
351
+ "loss": 0.0148,
352
+ "step": 24500
353
+ },
354
+ {
355
+ "epoch": 1.1,
356
+ "grad_norm": 0.02235906571149826,
357
+ "learning_rate": 2.4130754883675274e-05,
358
+ "loss": 0.0147,
359
+ "step": 25000
360
+ },
361
+ {
362
+ "epoch": 1.13,
363
+ "grad_norm": 0.005854467861354351,
364
+ "learning_rate": 2.394669677039364e-05,
365
+ "loss": 0.0149,
366
+ "step": 25500
367
+ },
368
+ {
369
+ "epoch": 1.15,
370
+ "grad_norm": 0.011547247879207134,
371
+ "learning_rate": 2.376263865711201e-05,
372
+ "loss": 0.0147,
373
+ "step": 26000
374
+ },
375
+ {
376
+ "epoch": 1.17,
377
+ "grad_norm": 0.03119933046400547,
378
+ "learning_rate": 2.3578580543830375e-05,
379
+ "loss": 0.015,
380
+ "step": 26500
381
+ },
382
+ {
383
+ "epoch": 1.19,
384
+ "grad_norm": 0.047728102654218674,
385
+ "learning_rate": 2.339452243054874e-05,
386
+ "loss": 0.0146,
387
+ "step": 27000
388
+ },
389
+ {
390
+ "epoch": 1.21,
391
+ "grad_norm": 0.04931659996509552,
392
+ "learning_rate": 2.3210464317267106e-05,
393
+ "loss": 0.0145,
394
+ "step": 27500
395
+ },
396
+ {
397
+ "epoch": 1.24,
398
+ "grad_norm": 0.22793345153331757,
399
+ "learning_rate": 2.3026406203985473e-05,
400
+ "loss": 0.0147,
401
+ "step": 28000
402
+ },
403
+ {
404
+ "epoch": 1.26,
405
+ "grad_norm": 0.046943288296461105,
406
+ "learning_rate": 2.2842348090703837e-05,
407
+ "loss": 0.0149,
408
+ "step": 28500
409
+ },
410
+ {
411
+ "epoch": 1.28,
412
+ "grad_norm": 0.07938718795776367,
413
+ "learning_rate": 2.2658289977422203e-05,
414
+ "loss": 0.0147,
415
+ "step": 29000
416
+ },
417
+ {
418
+ "epoch": 1.3,
419
+ "grad_norm": 0.02574564516544342,
420
+ "learning_rate": 2.247423186414057e-05,
421
+ "loss": 0.0144,
422
+ "step": 29500
423
+ },
424
+ {
425
+ "epoch": 1.33,
426
+ "grad_norm": 0.011776907369494438,
427
+ "learning_rate": 2.2290173750858937e-05,
428
+ "loss": 0.0147,
429
+ "step": 30000
430
+ },
431
+ {
432
+ "epoch": 1.35,
433
+ "grad_norm": 0.0066869258880615234,
434
+ "learning_rate": 2.2106115637577304e-05,
435
+ "loss": 0.0147,
436
+ "step": 30500
437
+ },
438
+ {
439
+ "epoch": 1.37,
440
+ "grad_norm": 0.010923570953309536,
441
+ "learning_rate": 2.192205752429567e-05,
442
+ "loss": 0.0145,
443
+ "step": 31000
444
+ },
445
+ {
446
+ "epoch": 1.39,
447
+ "grad_norm": 0.03816843405365944,
448
+ "learning_rate": 2.173799941101404e-05,
449
+ "loss": 0.0149,
450
+ "step": 31500
451
+ },
452
+ {
453
+ "epoch": 1.41,
454
+ "grad_norm": 0.04863005876541138,
455
+ "learning_rate": 2.1553941297732405e-05,
456
+ "loss": 0.0147,
457
+ "step": 32000
458
+ },
459
+ {
460
+ "epoch": 1.44,
461
+ "grad_norm": 0.09232094883918762,
462
+ "learning_rate": 2.1369883184450772e-05,
463
+ "loss": 0.0148,
464
+ "step": 32500
465
+ },
466
+ {
467
+ "epoch": 1.46,
468
+ "grad_norm": 0.006221645046025515,
469
+ "learning_rate": 2.118582507116914e-05,
470
+ "loss": 0.0148,
471
+ "step": 33000
472
+ },
473
+ {
474
+ "epoch": 1.48,
475
+ "grad_norm": 0.008674496784806252,
476
+ "learning_rate": 2.1001766957887506e-05,
477
+ "loss": 0.0145,
478
+ "step": 33500
479
+ },
480
+ {
481
+ "epoch": 1.5,
482
+ "grad_norm": 0.013610797002911568,
483
+ "learning_rate": 2.081770884460587e-05,
484
+ "loss": 0.0149,
485
+ "step": 34000
486
+ },
487
+ {
488
+ "epoch": 1.52,
489
+ "grad_norm": 0.020487351343035698,
490
+ "learning_rate": 2.0633650731324237e-05,
491
+ "loss": 0.0147,
492
+ "step": 34500
493
+ },
494
+ {
495
+ "epoch": 1.55,
496
+ "grad_norm": 0.004388992674648762,
497
+ "learning_rate": 2.0449592618042604e-05,
498
+ "loss": 0.0146,
499
+ "step": 35000
500
+ },
501
+ {
502
+ "epoch": 1.57,
503
+ "grad_norm": 0.029407095164060593,
504
+ "learning_rate": 2.026553450476097e-05,
505
+ "loss": 0.0144,
506
+ "step": 35500
507
+ },
508
+ {
509
+ "epoch": 1.59,
510
+ "grad_norm": 0.04079248011112213,
511
+ "learning_rate": 2.0081476391479335e-05,
512
+ "loss": 0.0146,
513
+ "step": 36000
514
+ },
515
+ {
516
+ "epoch": 1.61,
517
+ "grad_norm": 0.033315982669591904,
518
+ "learning_rate": 1.98974182781977e-05,
519
+ "loss": 0.0143,
520
+ "step": 36500
521
+ },
522
+ {
523
+ "epoch": 1.63,
524
+ "grad_norm": 0.00864444486796856,
525
+ "learning_rate": 1.971336016491607e-05,
526
+ "loss": 0.0145,
527
+ "step": 37000
528
+ },
529
+ {
530
+ "epoch": 1.66,
531
+ "grad_norm": 0.03435393422842026,
532
+ "learning_rate": 1.9529302051634436e-05,
533
+ "loss": 0.0145,
534
+ "step": 37500
535
+ },
536
+ {
537
+ "epoch": 1.68,
538
+ "grad_norm": 0.008053929544985294,
539
+ "learning_rate": 1.9345243938352803e-05,
540
+ "loss": 0.0146,
541
+ "step": 38000
542
+ },
543
+ {
544
+ "epoch": 1.7,
545
+ "grad_norm": 0.004771470092236996,
546
+ "learning_rate": 1.916118582507117e-05,
547
+ "loss": 0.0143,
548
+ "step": 38500
549
+ },
550
+ {
551
+ "epoch": 1.72,
552
+ "grad_norm": 0.016594666987657547,
553
+ "learning_rate": 1.8977127711789537e-05,
554
+ "loss": 0.0149,
555
+ "step": 39000
556
+ },
557
+ {
558
+ "epoch": 1.74,
559
+ "grad_norm": 0.01181800477206707,
560
+ "learning_rate": 1.8793069598507904e-05,
561
+ "loss": 0.0143,
562
+ "step": 39500
563
+ },
564
+ {
565
+ "epoch": 1.77,
566
+ "grad_norm": 0.03508065640926361,
567
+ "learning_rate": 1.860901148522627e-05,
568
+ "loss": 0.0146,
569
+ "step": 40000
570
+ },
571
+ {
572
+ "epoch": 1.79,
573
+ "grad_norm": 0.028093870729207993,
574
+ "learning_rate": 1.8424953371944638e-05,
575
+ "loss": 0.0146,
576
+ "step": 40500
577
+ },
578
+ {
579
+ "epoch": 1.81,
580
+ "grad_norm": 0.05029403790831566,
581
+ "learning_rate": 1.8240895258663005e-05,
582
+ "loss": 0.0146,
583
+ "step": 41000
584
+ },
585
+ {
586
+ "epoch": 1.83,
587
+ "grad_norm": 0.006528925616294146,
588
+ "learning_rate": 1.8056837145381368e-05,
589
+ "loss": 0.0142,
590
+ "step": 41500
591
+ },
592
+ {
593
+ "epoch": 1.86,
594
+ "grad_norm": 0.02557162009179592,
595
+ "learning_rate": 1.7872779032099735e-05,
596
+ "loss": 0.0144,
597
+ "step": 42000
598
+ },
599
+ {
600
+ "epoch": 1.88,
601
+ "grad_norm": 0.025360217317938805,
602
+ "learning_rate": 1.7688720918818102e-05,
603
+ "loss": 0.0146,
604
+ "step": 42500
605
+ },
606
+ {
607
+ "epoch": 1.9,
608
+ "grad_norm": 0.04788580909371376,
609
+ "learning_rate": 1.750466280553647e-05,
610
+ "loss": 0.0147,
611
+ "step": 43000
612
+ },
613
+ {
614
+ "epoch": 1.92,
615
+ "grad_norm": 0.02906920574605465,
616
+ "learning_rate": 1.7320604692254836e-05,
617
+ "loss": 0.0144,
618
+ "step": 43500
619
+ },
620
+ {
621
+ "epoch": 1.94,
622
+ "grad_norm": 0.012823808006942272,
623
+ "learning_rate": 1.71365465789732e-05,
624
+ "loss": 0.0145,
625
+ "step": 44000
626
+ },
627
+ {
628
+ "epoch": 1.97,
629
+ "grad_norm": 0.008996455930173397,
630
+ "learning_rate": 1.6952488465691567e-05,
631
+ "loss": 0.0143,
632
+ "step": 44500
633
+ },
634
+ {
635
+ "epoch": 1.99,
636
+ "grad_norm": 0.010119748301804066,
637
+ "learning_rate": 1.6768430352409934e-05,
638
+ "loss": 0.0146,
639
+ "step": 45000
640
+ },
641
+ {
642
+ "epoch": 2.01,
643
+ "grad_norm": 0.02591855265200138,
644
+ "learning_rate": 1.65843722391283e-05,
645
+ "loss": 0.0143,
646
+ "step": 45500
647
+ },
648
+ {
649
+ "epoch": 2.03,
650
+ "grad_norm": 0.013729671947658062,
651
+ "learning_rate": 1.6400314125846668e-05,
652
+ "loss": 0.0147,
653
+ "step": 46000
654
+ },
655
+ {
656
+ "epoch": 2.05,
657
+ "grad_norm": 0.0771203562617302,
658
+ "learning_rate": 1.6216256012565035e-05,
659
+ "loss": 0.0146,
660
+ "step": 46500
661
+ },
662
+ {
663
+ "epoch": 2.08,
664
+ "grad_norm": 0.04501279070973396,
665
+ "learning_rate": 1.60321978992834e-05,
666
+ "loss": 0.0144,
667
+ "step": 47000
668
+ },
669
+ {
670
+ "epoch": 2.1,
671
+ "grad_norm": 0.03493111953139305,
672
+ "learning_rate": 1.584813978600177e-05,
673
+ "loss": 0.0144,
674
+ "step": 47500
675
+ },
676
+ {
677
+ "epoch": 2.12,
678
+ "grad_norm": 0.01472916454076767,
679
+ "learning_rate": 1.5664081672720136e-05,
680
+ "loss": 0.0144,
681
+ "step": 48000
682
+ },
683
+ {
684
+ "epoch": 2.14,
685
+ "grad_norm": 0.04763146862387657,
686
+ "learning_rate": 1.54800235594385e-05,
687
+ "loss": 0.0145,
688
+ "step": 48500
689
+ },
690
+ {
691
+ "epoch": 2.16,
692
+ "grad_norm": 0.024467509239912033,
693
+ "learning_rate": 1.5295965446156866e-05,
694
+ "loss": 0.0144,
695
+ "step": 49000
696
+ },
697
+ {
698
+ "epoch": 2.19,
699
+ "grad_norm": 0.01768341101706028,
700
+ "learning_rate": 1.5111907332875235e-05,
701
+ "loss": 0.0144,
702
+ "step": 49500
703
+ },
704
+ {
705
+ "epoch": 2.21,
706
+ "grad_norm": 0.06102894991636276,
707
+ "learning_rate": 1.49278492195936e-05,
708
+ "loss": 0.0141,
709
+ "step": 50000
710
+ },
711
+ {
712
+ "epoch": 2.23,
713
+ "grad_norm": 0.01851697266101837,
714
+ "learning_rate": 1.4743791106311966e-05,
715
+ "loss": 0.0142,
716
+ "step": 50500
717
+ },
718
+ {
719
+ "epoch": 2.25,
720
+ "grad_norm": 0.015444310382008553,
721
+ "learning_rate": 1.4559732993030333e-05,
722
+ "loss": 0.0145,
723
+ "step": 51000
724
+ },
725
+ {
726
+ "epoch": 2.27,
727
+ "grad_norm": 0.013120009563863277,
728
+ "learning_rate": 1.43756748797487e-05,
729
+ "loss": 0.0143,
730
+ "step": 51500
731
+ },
732
+ {
733
+ "epoch": 2.3,
734
+ "grad_norm": 0.01589464209973812,
735
+ "learning_rate": 1.4191616766467067e-05,
736
+ "loss": 0.0141,
737
+ "step": 52000
738
+ },
739
+ {
740
+ "epoch": 2.32,
741
+ "grad_norm": 0.040490709245204926,
742
+ "learning_rate": 1.4007558653185433e-05,
743
+ "loss": 0.0144,
744
+ "step": 52500
745
+ },
746
+ {
747
+ "epoch": 2.34,
748
+ "grad_norm": 0.025874989107251167,
749
+ "learning_rate": 1.38235005399038e-05,
750
+ "loss": 0.0143,
751
+ "step": 53000
752
+ },
753
+ {
754
+ "epoch": 2.36,
755
+ "grad_norm": 0.022394156083464622,
756
+ "learning_rate": 1.3639442426622166e-05,
757
+ "loss": 0.0144,
758
+ "step": 53500
759
+ },
760
+ {
761
+ "epoch": 2.39,
762
+ "grad_norm": 0.010273805819451809,
763
+ "learning_rate": 1.3455384313340531e-05,
764
+ "loss": 0.0144,
765
+ "step": 54000
766
+ },
767
+ {
768
+ "epoch": 2.41,
769
+ "grad_norm": 0.028374383226037025,
770
+ "learning_rate": 1.3271326200058898e-05,
771
+ "loss": 0.0144,
772
+ "step": 54500
773
+ },
774
+ {
775
+ "epoch": 2.43,
776
+ "grad_norm": 0.018441613763570786,
777
+ "learning_rate": 1.3087268086777265e-05,
778
+ "loss": 0.0145,
779
+ "step": 55000
780
+ },
781
+ {
782
+ "epoch": 2.45,
783
+ "grad_norm": 0.006460436619818211,
784
+ "learning_rate": 1.2903209973495632e-05,
785
+ "loss": 0.0145,
786
+ "step": 55500
787
+ },
788
+ {
789
+ "epoch": 2.47,
790
+ "grad_norm": 0.00770485308021307,
791
+ "learning_rate": 1.2719151860213999e-05,
792
+ "loss": 0.0144,
793
+ "step": 56000
794
+ },
795
+ {
796
+ "epoch": 2.5,
797
+ "grad_norm": 0.00849447026848793,
798
+ "learning_rate": 1.2535093746932366e-05,
799
+ "loss": 0.0147,
800
+ "step": 56500
801
+ },
802
+ {
803
+ "epoch": 2.52,
804
+ "grad_norm": 0.035621609538793564,
805
+ "learning_rate": 1.2351035633650733e-05,
806
+ "loss": 0.0145,
807
+ "step": 57000
808
+ },
809
+ {
810
+ "epoch": 2.54,
811
+ "grad_norm": 0.008923010900616646,
812
+ "learning_rate": 1.2166977520369098e-05,
813
+ "loss": 0.0144,
814
+ "step": 57500
815
+ },
816
+ {
817
+ "epoch": 2.56,
818
+ "grad_norm": 0.0056849876418709755,
819
+ "learning_rate": 1.1982919407087464e-05,
820
+ "loss": 0.0143,
821
+ "step": 58000
822
+ },
823
+ {
824
+ "epoch": 2.58,
825
+ "grad_norm": 0.0071659935638308525,
826
+ "learning_rate": 1.179886129380583e-05,
827
+ "loss": 0.0144,
828
+ "step": 58500
829
+ },
830
+ {
831
+ "epoch": 2.61,
832
+ "grad_norm": 0.021617043763399124,
833
+ "learning_rate": 1.1614803180524198e-05,
834
+ "loss": 0.0143,
835
+ "step": 59000
836
+ },
837
+ {
838
+ "epoch": 2.63,
839
+ "grad_norm": 0.011144719086587429,
840
+ "learning_rate": 1.1430745067242565e-05,
841
+ "loss": 0.0142,
842
+ "step": 59500
843
+ },
844
+ {
845
+ "epoch": 2.65,
846
+ "grad_norm": 0.010943782515823841,
847
+ "learning_rate": 1.1246686953960932e-05,
848
+ "loss": 0.0143,
849
+ "step": 60000
850
+ },
851
+ {
852
+ "epoch": 2.67,
853
+ "grad_norm": 0.010286700911819935,
854
+ "learning_rate": 1.1062628840679299e-05,
855
+ "loss": 0.0143,
856
+ "step": 60500
857
+ },
858
+ {
859
+ "epoch": 2.69,
860
+ "grad_norm": 0.010169615969061852,
861
+ "learning_rate": 1.0878570727397666e-05,
862
+ "loss": 0.0143,
863
+ "step": 61000
864
+ },
865
+ {
866
+ "epoch": 2.72,
867
+ "grad_norm": 0.032067082822322845,
868
+ "learning_rate": 1.069451261411603e-05,
869
+ "loss": 0.0144,
870
+ "step": 61500
871
+ },
872
+ {
873
+ "epoch": 2.74,
874
+ "grad_norm": 0.008680183440446854,
875
+ "learning_rate": 1.0510454500834396e-05,
876
+ "loss": 0.0143,
877
+ "step": 62000
878
+ },
879
+ {
880
+ "epoch": 2.76,
881
+ "grad_norm": 0.01648719422519207,
882
+ "learning_rate": 1.0326396387552763e-05,
883
+ "loss": 0.0143,
884
+ "step": 62500
885
+ },
886
+ {
887
+ "epoch": 2.78,
888
+ "grad_norm": 0.0210120789706707,
889
+ "learning_rate": 1.014233827427113e-05,
890
+ "loss": 0.0146,
891
+ "step": 63000
892
+ },
893
+ {
894
+ "epoch": 2.81,
895
+ "grad_norm": 0.034336596727371216,
896
+ "learning_rate": 9.958280160989497e-06,
897
+ "loss": 0.0144,
898
+ "step": 63500
899
+ },
900
+ {
901
+ "epoch": 2.83,
902
+ "grad_norm": 0.03138417750597,
903
+ "learning_rate": 9.774222047707864e-06,
904
+ "loss": 0.0141,
905
+ "step": 64000
906
+ },
907
+ {
908
+ "epoch": 2.85,
909
+ "grad_norm": 0.01799875684082508,
910
+ "learning_rate": 9.590163934426231e-06,
911
+ "loss": 0.0142,
912
+ "step": 64500
913
+ },
914
+ {
915
+ "epoch": 2.87,
916
+ "grad_norm": 0.02960127592086792,
917
+ "learning_rate": 9.406105821144595e-06,
918
+ "loss": 0.0143,
919
+ "step": 65000
920
+ },
921
+ {
922
+ "epoch": 2.89,
923
+ "grad_norm": 0.012712860479950905,
924
+ "learning_rate": 9.222047707862962e-06,
925
+ "loss": 0.0146,
926
+ "step": 65500
927
+ },
928
+ {
929
+ "epoch": 2.92,
930
+ "grad_norm": 0.009180006571114063,
931
+ "learning_rate": 9.037989594581329e-06,
932
+ "loss": 0.0143,
933
+ "step": 66000
934
+ },
935
+ {
936
+ "epoch": 2.94,
937
+ "grad_norm": 0.0106426402926445,
938
+ "learning_rate": 8.853931481299696e-06,
939
+ "loss": 0.0143,
940
+ "step": 66500
941
+ },
942
+ {
943
+ "epoch": 2.96,
944
+ "grad_norm": 0.03638075664639473,
945
+ "learning_rate": 8.669873368018063e-06,
946
+ "loss": 0.0141,
947
+ "step": 67000
948
+ },
949
+ {
950
+ "epoch": 2.98,
951
+ "grad_norm": 0.02028089202940464,
952
+ "learning_rate": 8.48581525473643e-06,
953
+ "loss": 0.0144,
954
+ "step": 67500
955
+ },
956
+ {
957
+ "epoch": 3.0,
958
+ "grad_norm": 0.004987742286175489,
959
+ "learning_rate": 8.301757141454797e-06,
960
+ "loss": 0.0143,
961
+ "step": 68000
962
+ },
963
+ {
964
+ "epoch": 3.03,
965
+ "grad_norm": 0.012421207502484322,
966
+ "learning_rate": 8.117699028173162e-06,
967
+ "loss": 0.0145,
968
+ "step": 68500
969
+ },
970
+ {
971
+ "epoch": 3.05,
972
+ "grad_norm": 0.05489884316921234,
973
+ "learning_rate": 7.933640914891527e-06,
974
+ "loss": 0.0142,
975
+ "step": 69000
976
+ },
977
+ {
978
+ "epoch": 3.07,
979
+ "grad_norm": 0.007833471521735191,
980
+ "learning_rate": 7.749582801609894e-06,
981
+ "loss": 0.0145,
982
+ "step": 69500
983
+ },
984
+ {
985
+ "epoch": 3.09,
986
+ "grad_norm": 0.014776123687624931,
987
+ "learning_rate": 7.565524688328261e-06,
988
+ "loss": 0.0142,
989
+ "step": 70000
990
+ },
991
+ {
992
+ "epoch": 3.11,
993
+ "grad_norm": 0.015590249560773373,
994
+ "learning_rate": 7.381466575046628e-06,
995
+ "loss": 0.0143,
996
+ "step": 70500
997
+ },
998
+ {
999
+ "epoch": 3.14,
1000
+ "grad_norm": 0.018214261159300804,
1001
+ "learning_rate": 7.197408461764995e-06,
1002
+ "loss": 0.0142,
1003
+ "step": 71000
1004
+ },
1005
+ {
1006
+ "epoch": 3.16,
1007
+ "grad_norm": 0.029773008078336716,
1008
+ "learning_rate": 7.013350348483361e-06,
1009
+ "loss": 0.0143,
1010
+ "step": 71500
1011
+ },
1012
+ {
1013
+ "epoch": 3.18,
1014
+ "grad_norm": 0.028139958158135414,
1015
+ "learning_rate": 6.8292922352017276e-06,
1016
+ "loss": 0.0143,
1017
+ "step": 72000
1018
+ },
1019
+ {
1020
+ "epoch": 3.2,
1021
+ "grad_norm": 0.024558302015066147,
1022
+ "learning_rate": 6.6452341219200945e-06,
1023
+ "loss": 0.014,
1024
+ "step": 72500
1025
+ },
1026
+ {
1027
+ "epoch": 3.22,
1028
+ "grad_norm": 0.05188705772161484,
1029
+ "learning_rate": 6.461176008638461e-06,
1030
+ "loss": 0.0141,
1031
+ "step": 73000
1032
+ },
1033
+ {
1034
+ "epoch": 3.25,
1035
+ "grad_norm": 0.02240253984928131,
1036
+ "learning_rate": 6.277117895356828e-06,
1037
+ "loss": 0.0142,
1038
+ "step": 73500
1039
+ },
1040
+ {
1041
+ "epoch": 3.27,
1042
+ "grad_norm": 0.015994379296898842,
1043
+ "learning_rate": 6.093059782075194e-06,
1044
+ "loss": 0.0143,
1045
+ "step": 74000
1046
+ },
1047
+ {
1048
+ "epoch": 3.29,
1049
+ "grad_norm": 0.014095323160290718,
1050
+ "learning_rate": 5.909001668793561e-06,
1051
+ "loss": 0.0141,
1052
+ "step": 74500
1053
+ },
1054
+ {
1055
+ "epoch": 3.31,
1056
+ "grad_norm": 0.0076615894213318825,
1057
+ "learning_rate": 5.724943555511927e-06,
1058
+ "loss": 0.0141,
1059
+ "step": 75000
1060
+ },
1061
+ {
1062
+ "epoch": 3.34,
1063
+ "grad_norm": 0.023330098018050194,
1064
+ "learning_rate": 5.540885442230294e-06,
1065
+ "loss": 0.0143,
1066
+ "step": 75500
1067
+ },
1068
+ {
1069
+ "epoch": 3.36,
1070
+ "grad_norm": 0.022397508844733238,
1071
+ "learning_rate": 5.35682732894866e-06,
1072
+ "loss": 0.0141,
1073
+ "step": 76000
1074
+ },
1075
+ {
1076
+ "epoch": 3.38,
1077
+ "grad_norm": 0.01998765394091606,
1078
+ "learning_rate": 5.172769215667027e-06,
1079
+ "loss": 0.0143,
1080
+ "step": 76500
1081
+ },
1082
+ {
1083
+ "epoch": 3.4,
1084
+ "grad_norm": 0.07193479686975479,
1085
+ "learning_rate": 4.988711102385393e-06,
1086
+ "loss": 0.0143,
1087
+ "step": 77000
1088
+ },
1089
+ {
1090
+ "epoch": 3.42,
1091
+ "grad_norm": 0.030124777927994728,
1092
+ "learning_rate": 4.80465298910376e-06,
1093
+ "loss": 0.0146,
1094
+ "step": 77500
1095
+ },
1096
+ {
1097
+ "epoch": 3.45,
1098
+ "grad_norm": 0.0762249082326889,
1099
+ "learning_rate": 4.620594875822126e-06,
1100
+ "loss": 0.0144,
1101
+ "step": 78000
1102
+ },
1103
+ {
1104
+ "epoch": 3.47,
1105
+ "grad_norm": 0.030013220384716988,
1106
+ "learning_rate": 4.4365367625404925e-06,
1107
+ "loss": 0.0143,
1108
+ "step": 78500
1109
+ },
1110
+ {
1111
+ "epoch": 3.49,
1112
+ "grad_norm": 0.013210024684667587,
1113
+ "learning_rate": 4.2524786492588595e-06,
1114
+ "loss": 0.0144,
1115
+ "step": 79000
1116
+ },
1117
+ {
1118
+ "epoch": 3.51,
1119
+ "grad_norm": 0.021476522088050842,
1120
+ "learning_rate": 4.068420535977226e-06,
1121
+ "loss": 0.0145,
1122
+ "step": 79500
1123
+ },
1124
+ {
1125
+ "epoch": 3.53,
1126
+ "grad_norm": 0.005120801739394665,
1127
+ "learning_rate": 3.884362422695593e-06,
1128
+ "loss": 0.0142,
1129
+ "step": 80000
1130
+ },
1131
+ {
1132
+ "epoch": 3.56,
1133
+ "grad_norm": 0.03378542512655258,
1134
+ "learning_rate": 3.7003043094139592e-06,
1135
+ "loss": 0.0144,
1136
+ "step": 80500
1137
+ },
1138
+ {
1139
+ "epoch": 3.58,
1140
+ "grad_norm": 0.004559422377496958,
1141
+ "learning_rate": 3.5162461961323254e-06,
1142
+ "loss": 0.0143,
1143
+ "step": 81000
1144
+ },
1145
+ {
1146
+ "epoch": 3.6,
1147
+ "grad_norm": 0.022087154909968376,
1148
+ "learning_rate": 3.3321880828506924e-06,
1149
+ "loss": 0.0142,
1150
+ "step": 81500
1151
+ },
1152
+ {
1153
+ "epoch": 3.62,
1154
+ "grad_norm": 0.027302134782075882,
1155
+ "learning_rate": 3.1481299695690585e-06,
1156
+ "loss": 0.0142,
1157
+ "step": 82000
1158
+ },
1159
+ {
1160
+ "epoch": 3.64,
1161
+ "grad_norm": 0.0070667564868927,
1162
+ "learning_rate": 2.964071856287425e-06,
1163
+ "loss": 0.0142,
1164
+ "step": 82500
1165
+ },
1166
+ {
1167
+ "epoch": 3.67,
1168
+ "grad_norm": 0.004392644390463829,
1169
+ "learning_rate": 2.7800137430057916e-06,
1170
+ "loss": 0.014,
1171
+ "step": 83000
1172
+ },
1173
+ {
1174
+ "epoch": 3.69,
1175
+ "grad_norm": 0.004756265785545111,
1176
+ "learning_rate": 2.595955629724158e-06,
1177
+ "loss": 0.0144,
1178
+ "step": 83500
1179
+ },
1180
+ {
1181
+ "epoch": 3.71,
1182
+ "grad_norm": 0.028167065232992172,
1183
+ "learning_rate": 2.4118975164425248e-06,
1184
+ "loss": 0.0141,
1185
+ "step": 84000
1186
+ },
1187
+ {
1188
+ "epoch": 3.73,
1189
+ "grad_norm": 0.04241223633289337,
1190
+ "learning_rate": 2.2278394031608913e-06,
1191
+ "loss": 0.0143,
1192
+ "step": 84500
1193
+ },
1194
+ {
1195
+ "epoch": 3.75,
1196
+ "grad_norm": 0.0073333000764250755,
1197
+ "learning_rate": 2.043781289879258e-06,
1198
+ "loss": 0.014,
1199
+ "step": 85000
1200
+ },
1201
+ {
1202
+ "epoch": 3.78,
1203
+ "grad_norm": 0.022960776463150978,
1204
+ "learning_rate": 1.8597231765976245e-06,
1205
+ "loss": 0.0146,
1206
+ "step": 85500
1207
+ },
1208
+ {
1209
+ "epoch": 3.8,
1210
+ "grad_norm": 0.009491208009421825,
1211
+ "learning_rate": 1.675665063315991e-06,
1212
+ "loss": 0.0143,
1213
+ "step": 86000
1214
+ },
1215
+ {
1216
+ "epoch": 3.82,
1217
+ "grad_norm": 0.04244249686598778,
1218
+ "learning_rate": 1.4916069500343576e-06,
1219
+ "loss": 0.0142,
1220
+ "step": 86500
1221
+ },
1222
+ {
1223
+ "epoch": 3.84,
1224
+ "grad_norm": 0.0845978856086731,
1225
+ "learning_rate": 1.307548836752724e-06,
1226
+ "loss": 0.0139,
1227
+ "step": 87000
1228
+ },
1229
+ {
1230
+ "epoch": 3.87,
1231
+ "grad_norm": 0.012684383429586887,
1232
+ "learning_rate": 1.1234907234710905e-06,
1233
+ "loss": 0.0143,
1234
+ "step": 87500
1235
+ },
1236
+ {
1237
+ "epoch": 3.89,
1238
+ "grad_norm": 0.010290221311151981,
1239
+ "learning_rate": 9.394326101894571e-07,
1240
+ "loss": 0.0144,
1241
+ "step": 88000
1242
+ },
1243
+ {
1244
+ "epoch": 3.91,
1245
+ "grad_norm": 0.009122644551098347,
1246
+ "learning_rate": 7.553744969078238e-07,
1247
+ "loss": 0.0143,
1248
+ "step": 88500
1249
+ },
1250
+ {
1251
+ "epoch": 3.93,
1252
+ "grad_norm": 0.02026693895459175,
1253
+ "learning_rate": 5.713163836261902e-07,
1254
+ "loss": 0.0142,
1255
+ "step": 89000
1256
+ },
1257
+ {
1258
+ "epoch": 3.95,
1259
+ "grad_norm": 0.03241865336894989,
1260
+ "learning_rate": 3.8725827034455676e-07,
1261
+ "loss": 0.0142,
1262
+ "step": 89500
1263
+ },
1264
+ {
1265
+ "epoch": 3.98,
1266
+ "grad_norm": 0.04753628000617027,
1267
+ "learning_rate": 2.0320015706292333e-07,
1268
+ "loss": 0.0142,
1269
+ "step": 90000
1270
+ }
1271
+ ],
1272
+ "logging_steps": 500,
1273
+ "max_steps": 90552,
1274
+ "num_input_tokens_seen": 0,
1275
+ "num_train_epochs": 4,
1276
+ "save_steps": 10000,
1277
+ "total_flos": 0.0,
1278
+ "train_batch_size": 32,
1279
+ "trial_name": null,
1280
+ "trial_params": null
1281
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c9d2d8c930913ff2f4ae4382de12a1443b6cd2be1903c4675e15ed7f1a359e1
3
+ size 5176
vocab.txt ADDED
The diff for this file is too large to render. See raw diff