tsavage68's picture
End of training
40d3264 verified
metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Na_M2_1000steps_1e7rate_01beta_cSFTDPO
    results: []

Na_M2_1000steps_1e7rate_01beta_cSFTDPO

This model is a fine-tuned version of tsavage68/Na_M2_1000steps_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: 2.1377
  • Rewards/rejected: -11.0023
  • Rewards/accuracies: 1.0
  • Rewards/margins: 13.1400
  • Logps/rejected: -189.9462
  • Logps/chosen: -26.7554
  • Logits/rejected: -2.3910
  • Logits/chosen: -2.4209

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0514 0.2667 50 0.0095 1.0907 -3.6125 1.0 4.7032 -116.0486 -37.2253 -2.5048 -2.5196
0.0 0.5333 100 0.0000 1.9516 -8.2814 1.0 10.2330 -162.7370 -28.6162 -2.4273 -2.4516
0.0 0.8 150 0.0000 2.0205 -8.9692 1.0 10.9897 -169.6156 -27.9274 -2.4141 -2.4403
0.0 1.0667 200 0.0000 2.0546 -9.4358 1.0 11.4904 -174.2812 -27.5861 -2.4057 -2.4333
0.0 1.3333 250 0.0000 2.0861 -9.8928 1.0 11.9789 -178.8511 -27.2716 -2.4011 -2.4294
0.0 1.6 300 0.0000 2.0968 -10.1847 1.0 12.2815 -181.7704 -27.1646 -2.3981 -2.4268
0.0 1.8667 350 0.0000 2.1068 -10.4154 1.0 12.5222 -184.0774 -27.0641 -2.3951 -2.4241
0.0 2.1333 400 0.0000 2.1173 -10.5894 1.0 12.7067 -185.8174 -26.9596 -2.3948 -2.4241
0.0 2.4 450 0.0000 2.1209 -10.7301 1.0 12.8510 -187.2248 -26.9235 -2.3923 -2.4219
0.0 2.6667 500 0.0000 2.1295 -10.8281 1.0 12.9576 -188.2044 -26.8375 -2.3924 -2.4220
0.0 2.9333 550 0.0000 2.1355 -10.9054 1.0 13.0409 -188.9772 -26.7771 -2.3914 -2.4212
0.0 3.2 600 0.0000 2.1356 -10.9448 1.0 13.0805 -189.3718 -26.7761 -2.3903 -2.4200
0.0 3.4667 650 0.0000 2.1418 -10.9896 1.0 13.1314 -189.8192 -26.7140 -2.3895 -2.4193
0.0 3.7333 700 0.0000 2.1378 -11.0004 1.0 13.1382 -189.9273 -26.7544 -2.3901 -2.4200
0.0 4.0 750 0.0000 2.1390 -11.0020 1.0 13.1409 -189.9431 -26.7428 -2.3910 -2.4208
0.0 4.2667 800 0.0000 2.1358 -11.0021 1.0 13.1378 -189.9439 -26.7747 -2.3902 -2.4201
0.0 4.5333 850 0.0000 2.1380 -11.0024 1.0 13.1404 -189.9469 -26.7523 -2.3908 -2.4207
0.0 4.8 900 0.0000 2.1377 -11.0023 1.0 13.1400 -189.9462 -26.7554 -2.3910 -2.4209
0.0 5.0667 950 0.0000 2.1377 -11.0023 1.0 13.1400 -189.9462 -26.7554 -2.3910 -2.4209
0.0 5.3333 1000 0.0000 2.1377 -11.0023 1.0 13.1400 -189.9462 -26.7554 -2.3910 -2.4209

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1