tsavage68's picture
End of training
1796917 verified
metadata
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_L3_1000steps_1e7rate_01beta_cSFTDPO
    results: []

IE_L3_1000steps_1e7rate_01beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1803
  • Rewards/chosen: -0.5346
  • Rewards/rejected: -8.6468
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 8.1123
  • Logps/rejected: -162.0956
  • Logps/chosen: -88.1433
  • Logits/rejected: -0.8498
  • Logits/chosen: -0.7319

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6292 0.4 50 0.5972 -0.0178 -0.2247 0.7400 0.2070 -77.8745 -82.9754 -0.7952 -0.7369
0.2432 0.8 100 0.2531 -0.1984 -1.9084 0.7400 1.7099 -94.7109 -84.7823 -0.7935 -0.7222
0.1468 1.2 150 0.1842 -0.4156 -4.4900 0.7400 4.0744 -120.5273 -86.9542 -0.8149 -0.7193
0.1745 1.6 200 0.1807 -0.4305 -6.5857 0.7400 6.1551 -141.4839 -87.1031 -0.8342 -0.7283
0.2254 2.0 250 0.1805 -0.4554 -7.3110 0.7400 6.8555 -148.7368 -87.3519 -0.8373 -0.7278
0.1389 2.4 300 0.1804 -0.4666 -7.7073 0.7400 7.2408 -152.7006 -87.4635 -0.8397 -0.7280
0.1215 2.8 350 0.1804 -0.4933 -8.0779 0.7400 7.5846 -156.4058 -87.7304 -0.8446 -0.7309
0.191 3.2 400 0.1804 -0.5121 -8.2398 0.7400 7.7277 -158.0253 -87.9188 -0.8463 -0.7322
0.1906 3.6 450 0.1804 -0.5199 -8.2886 0.7400 7.7687 -158.5128 -87.9963 -0.8471 -0.7317
0.2084 4.0 500 0.1804 -0.5104 -8.4325 0.7400 7.9221 -159.9520 -87.9018 -0.8488 -0.7326
0.1561 4.4 550 0.1803 -0.5293 -8.5197 0.7400 7.9905 -160.8244 -88.0903 -0.8493 -0.7326
0.1213 4.8 600 0.1803 -0.5356 -8.5680 0.7400 8.0324 -161.3075 -88.1538 -0.8503 -0.7332
0.1907 5.2 650 0.1803 -0.5333 -8.6184 0.7400 8.0851 -161.8111 -88.1307 -0.8505 -0.7330
0.2427 5.6 700 0.1803 -0.5362 -8.6233 0.7400 8.0871 -161.8604 -88.1602 -0.8507 -0.7332
0.2601 6.0 750 0.1803 -0.5367 -8.6352 0.7400 8.0985 -161.9794 -88.1651 -0.8509 -0.7332
0.1213 6.4 800 0.1803 -0.5353 -8.6312 0.7400 8.0960 -161.9397 -88.1506 -0.8507 -0.7334
0.2426 6.8 850 0.1803 -0.5305 -8.6468 0.7400 8.1163 -162.0951 -88.1023 -0.8507 -0.7328
0.1733 7.2 900 0.1803 -0.5246 -8.6359 0.7400 8.1112 -161.9858 -88.0442 -0.8503 -0.7323
0.1388 7.6 950 0.1803 -0.5346 -8.6468 0.7400 8.1123 -162.0956 -88.1433 -0.8498 -0.7319
0.1561 8.0 1000 0.1803 -0.5346 -8.6468 0.7400 8.1123 -162.0956 -88.1433 -0.8498 -0.7319

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1