FINGU-AI commited on
Commit
7339680
1 Parent(s): 90fc0d7

Update README.md

Browse files

remove training details

Files changed (1) hide show
  1. README.md +0 -154
README.md CHANGED
@@ -237,160 +237,6 @@ You can finetune this model on your own dataset.
237
  *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
238
  -->
239
 
240
- ## Training Details
241
-
242
- ### Training Hyperparameters
243
- #### Non-Default Hyperparameters
244
-
245
- - `eval_strategy`: steps
246
- - `per_device_train_batch_size`: 2
247
- - `per_device_eval_batch_size`: 2
248
- - `gradient_accumulation_steps`: 8
249
- - `learning_rate`: 2e-05
250
- - `num_train_epochs`: 1
251
- - `lr_scheduler_type`: cosine
252
- - `warmup_ratio`: 0.1
253
- - `warmup_steps`: 5
254
- - `bf16`: True
255
- - `tf32`: True
256
- - `optim`: adamw_torch_fused
257
- - `gradient_checkpointing`: True
258
- - `gradient_checkpointing_kwargs`: {'use_reentrant': False}
259
- - `batch_sampler`: no_duplicates
260
-
261
- #### All Hyperparameters
262
- <details><summary>Click to expand</summary>
263
-
264
- - `overwrite_output_dir`: False
265
- - `do_predict`: False
266
- - `eval_strategy`: steps
267
- - `prediction_loss_only`: True
268
- - `per_device_train_batch_size`: 2
269
- - `per_device_eval_batch_size`: 2
270
- - `per_gpu_train_batch_size`: None
271
- - `per_gpu_eval_batch_size`: None
272
- - `gradient_accumulation_steps`: 8
273
- - `eval_accumulation_steps`: None
274
- - `learning_rate`: 2e-05
275
- - `weight_decay`: 0.0
276
- - `adam_beta1`: 0.9
277
- - `adam_beta2`: 0.999
278
- - `adam_epsilon`: 1e-08
279
- - `max_grad_norm`: 1.0
280
- - `num_train_epochs`: 1
281
- - `max_steps`: -1
282
- - `lr_scheduler_type`: cosine
283
- - `lr_scheduler_kwargs`: {}
284
- - `warmup_ratio`: 0.1
285
- - `warmup_steps`: 5
286
- - `log_level`: passive
287
- - `log_level_replica`: warning
288
- - `log_on_each_node`: True
289
- - `logging_nan_inf_filter`: True
290
- - `save_safetensors`: True
291
- - `save_on_each_node`: False
292
- - `save_only_model`: False
293
- - `restore_callback_states_from_checkpoint`: False
294
- - `no_cuda`: False
295
- - `use_cpu`: False
296
- - `use_mps_device`: False
297
- - `seed`: 42
298
- - `data_seed`: None
299
- - `jit_mode_eval`: False
300
- - `use_ipex`: False
301
- - `bf16`: True
302
- - `fp16`: False
303
- - `fp16_opt_level`: O1
304
- - `half_precision_backend`: auto
305
- - `bf16_full_eval`: False
306
- - `fp16_full_eval`: False
307
- - `tf32`: True
308
- - `local_rank`: 3
309
- - `ddp_backend`: None
310
- - `tpu_num_cores`: None
311
- - `tpu_metrics_debug`: False
312
- - `debug`: []
313
- - `dataloader_drop_last`: True
314
- - `dataloader_num_workers`: 0
315
- - `dataloader_prefetch_factor`: None
316
- - `past_index`: -1
317
- - `disable_tqdm`: False
318
- - `remove_unused_columns`: True
319
- - `label_names`: None
320
- - `load_best_model_at_end`: False
321
- - `ignore_data_skip`: False
322
- - `fsdp`: []
323
- - `fsdp_min_num_params`: 0
324
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
325
- - `fsdp_transformer_layer_cls_to_wrap`: None
326
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
327
- - `deepspeed`: None
328
- - `label_smoothing_factor`: 0.0
329
- - `optim`: adamw_torch_fused
330
- - `optim_args`: None
331
- - `adafactor`: False
332
- - `group_by_length`: False
333
- - `length_column_name`: length
334
- - `ddp_find_unused_parameters`: None
335
- - `ddp_bucket_cap_mb`: None
336
- - `ddp_broadcast_buffers`: False
337
- - `dataloader_pin_memory`: True
338
- - `dataloader_persistent_workers`: False
339
- - `skip_memory_metrics`: True
340
- - `use_legacy_prediction_loop`: False
341
- - `push_to_hub`: False
342
- - `resume_from_checkpoint`: None
343
- - `hub_model_id`: None
344
- - `hub_strategy`: every_save
345
- - `hub_private_repo`: False
346
- - `hub_always_push`: False
347
- - `gradient_checkpointing`: True
348
- - `gradient_checkpointing_kwargs`: {'use_reentrant': False}
349
- - `include_inputs_for_metrics`: False
350
- - `eval_do_concat_batches`: True
351
- - `fp16_backend`: auto
352
- - `push_to_hub_model_id`: None
353
- - `push_to_hub_organization`: None
354
- - `mp_parameters`:
355
- - `auto_find_batch_size`: False
356
- - `full_determinism`: False
357
- - `torchdynamo`: None
358
- - `ray_scope`: last
359
- - `ddp_timeout`: 1800
360
- - `torch_compile`: False
361
- - `torch_compile_backend`: None
362
- - `torch_compile_mode`: None
363
- - `dispatch_batches`: None
364
- - `split_batches`: None
365
- - `include_tokens_per_second`: False
366
- - `include_num_input_tokens_seen`: False
367
- - `neftune_noise_alpha`: None
368
- - `optim_target_modules`: None
369
- - `batch_eval_metrics`: False
370
- - `batch_sampler`: no_duplicates
371
- - `multi_dataset_batch_sampler`: proportional
372
-
373
- </details>
374
-
375
- ### Training Logs
376
- | Epoch | Step | Training Loss | reranking loss | retrival loss | sts loss |
377
- |:------:|:----:|:-------------:|:--------------:|:-------------:|:--------:|
378
- | 0.1958 | 500 | 0.5225 | 0.3536 | 0.0413 | 0.5239 |
379
- | 0.3916 | 1000 | 0.2167 | 0.2598 | 0.0386 | 0.4230 |
380
- | 0.5875 | 1500 | 0.1924 | 0.2372 | 0.0320 | 0.4046 |
381
- | 0.7833 | 2000 | 0.1795 | 0.2292 | 0.0310 | 0.4005 |
382
- | 0.9791 | 2500 | 0.1755 | 0.2276 | 0.0306 | 0.3995 |
383
-
384
-
385
- ### Framework Versions
386
- - Python: 3.10.12
387
- - Sentence Transformers: 3.0.1
388
- - Transformers: 4.41.2
389
- - PyTorch: 2.2.0+cu121
390
- - Accelerate: 0.32.1
391
- - Datasets: 2.20.0
392
- - Tokenizers: 0.19.1
393
-
394
  ## Citation
395
 
396
  ### BibTeX
 
237
  *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
238
  -->
239
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
  ## Citation
241
 
242
  ### BibTeX