IrwinD's picture
End of training
bc8c057 verified
|
raw
history blame
3.4 kB
metadata
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
  - generated_from_trainer
datasets:
  - hdfs_rlhf_log_summary_dataset
model-index:
  - name: log_sage_reward_model
    results: []

log_sage_reward_model

This model is a fine-tuned version of distilbert/distilbert-base-uncased on the hdfs_rlhf_log_summary_dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0005

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.41e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 11 0.0022
No log 2.0 22 0.0049
No log 3.0 33 0.0006
No log 4.0 44 0.0006
No log 5.0 55 0.0008
No log 6.0 66 0.0003
No log 7.0 77 0.0005
No log 8.0 88 0.0010
No log 9.0 99 0.0008
No log 10.0 110 0.0007
No log 11.0 121 0.0007
No log 12.0 132 0.0006
No log 13.0 143 0.0006
No log 14.0 154 0.0004
No log 15.0 165 0.0007
No log 16.0 176 0.0007
No log 17.0 187 0.0006
No log 18.0 198 0.0004
No log 19.0 209 0.0005
No log 20.0 220 0.0006
No log 21.0 231 0.0006
No log 22.0 242 0.0006
No log 23.0 253 0.0009
No log 24.0 264 0.0006
No log 25.0 275 0.0007
No log 26.0 286 0.0005
No log 27.0 297 0.0005
No log 28.0 308 0.0004
No log 29.0 319 0.0004
No log 30.0 330 0.0005
No log 31.0 341 0.0005
No log 32.0 352 0.0005
No log 33.0 363 0.0005
No log 34.0 374 0.0004
No log 35.0 385 0.0004
No log 36.0 396 0.0005
No log 37.0 407 0.0005
No log 38.0 418 0.0005
No log 39.0 429 0.0005
No log 40.0 440 0.0005

Framework versions

  • Transformers 4.39.0
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2