tsavage68's picture
End of training
49d9b61 verified
metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Na_M2_1000steps_1e6rate_05beta_cSFTDPO
    results: []

Na_M2_1000steps_1e6rate_05beta_cSFTDPO

This model is a fine-tuned version of tsavage68/Na_M2_1000steps_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: 3.9080
  • Rewards/rejected: -14.8092
  • Rewards/accuracies: 1.0
  • Rewards/margins: 18.7172
  • Logps/rejected: -109.5417
  • Logps/chosen: -40.3163
  • Logits/rejected: -2.5104
  • Logits/chosen: -2.5251

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 0.2667 50 0.0000 2.9065 -11.8929 1.0 14.7994 -103.7091 -42.3194 -2.5220 -2.5359
0.0 0.5333 100 0.0000 3.3215 -13.0434 1.0 16.3649 -106.0102 -41.4894 -2.5188 -2.5330
0.0 0.8 150 0.0000 3.5466 -13.5189 1.0 17.0655 -106.9612 -41.0391 -2.5174 -2.5317
0.0 1.0667 200 0.0000 3.6119 -13.8627 1.0 17.4745 -107.6487 -40.9087 -2.5143 -2.5288
0.0 1.3333 250 0.0000 3.7085 -14.0115 1.0 17.7200 -107.9463 -40.7154 -2.5151 -2.5296
0.0 1.6 300 0.0000 3.7952 -14.1247 1.0 17.9199 -108.1728 -40.5420 -2.5141 -2.5286
0.0 1.8667 350 0.0000 3.7740 -14.2878 1.0 18.0618 -108.4989 -40.5843 -2.5139 -2.5284
0.0 2.1333 400 0.0000 3.8254 -14.4626 1.0 18.2880 -108.8486 -40.4816 -2.5124 -2.5269
0.0 2.4 450 0.0000 3.8372 -14.5044 1.0 18.3416 -108.9322 -40.4579 -2.5127 -2.5273
0.0 2.6667 500 0.0000 3.8544 -14.6284 1.0 18.4828 -109.1802 -40.4237 -2.5115 -2.5260
0.0 2.9333 550 0.0000 3.8744 -14.6609 1.0 18.5353 -109.2451 -40.3835 -2.5116 -2.5262
0.0 3.2 600 0.0000 3.9000 -14.7002 1.0 18.6002 -109.3238 -40.3324 -2.5103 -2.5249
0.0 3.4667 650 0.0000 3.9168 -14.7537 1.0 18.6705 -109.4308 -40.2988 -2.5105 -2.5252
0.0 3.7333 700 0.0000 3.9175 -14.7437 1.0 18.6612 -109.4108 -40.2974 -2.5112 -2.5259
0.0 4.0 750 0.0000 3.9128 -14.7841 1.0 18.6969 -109.4916 -40.3068 -2.5104 -2.5250
0.0 4.2667 800 0.0000 3.9063 -14.7726 1.0 18.6788 -109.4685 -40.3198 -2.5107 -2.5253
0.0 4.5333 850 0.0000 3.9178 -14.7925 1.0 18.7103 -109.5084 -40.2969 -2.5104 -2.5251
0.0 4.8 900 0.0000 3.9074 -14.8080 1.0 18.7154 -109.5393 -40.3176 -2.5104 -2.5251
0.0 5.0667 950 0.0000 3.9080 -14.8092 1.0 18.7172 -109.5417 -40.3163 -2.5104 -2.5251
0.0 5.3333 1000 0.0000 3.9080 -14.8092 1.0 18.7172 -109.5417 -40.3163 -2.5104 -2.5251

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1