tsavage68's picture
End of training
9529499 verified
metadata
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_L3_1000steps_1e5rate_01beta_cSFTDPO
    results: []

IE_L3_1000steps_1e5rate_01beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1802
  • Rewards/chosen: -0.6743
  • Rewards/rejected: -17.3206
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 16.6463
  • Logps/rejected: -248.8334
  • Logps/chosen: -89.5409
  • Logits/rejected: -0.7455
  • Logits/chosen: -0.5957

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1906 0.4 50 0.1802 -1.4004 -15.2732 0.7400 13.8728 -228.3591 -96.8015 -0.9287 -0.7696
0.1386 0.8 100 0.1802 -1.5499 -16.6031 0.7400 15.0532 -241.6585 -98.2971 -0.9445 -0.7764
0.1386 1.2 150 0.1802 -0.6661 -17.0830 0.7400 16.4169 -246.4570 -89.4588 -0.7451 -0.5958
0.1733 1.6 200 0.1802 -0.6529 -17.0537 0.7400 16.4009 -246.1647 -89.3264 -0.7451 -0.5961
0.2253 2.0 250 0.1802 -0.6671 -17.0542 0.7400 16.3871 -246.1687 -89.4687 -0.7452 -0.5962
0.1386 2.4 300 0.1802 -0.6548 -17.0821 0.7400 16.4273 -246.4482 -89.3456 -0.7451 -0.5961
0.1213 2.8 350 0.1802 -0.6721 -17.1171 0.7400 16.4449 -246.7978 -89.5189 -0.7458 -0.5962
0.1906 3.2 400 0.1802 -0.6653 -17.1157 0.7400 16.4504 -246.7844 -89.4512 -0.7457 -0.5962
0.1906 3.6 450 0.1802 -0.6617 -17.1771 0.7400 16.5154 -247.3981 -89.4149 -0.7446 -0.5950
0.2079 4.0 500 0.1802 -0.6833 -17.2332 0.7400 16.5498 -247.9588 -89.6311 -0.7448 -0.5952
0.156 4.4 550 0.1802 -0.6867 -17.2422 0.7400 16.5555 -248.0496 -89.6649 -0.7452 -0.5954
0.1213 4.8 600 0.1802 -0.6777 -17.2605 0.7400 16.5828 -248.2325 -89.5749 -0.7448 -0.5947
0.1906 5.2 650 0.1802 -0.6873 -17.3035 0.7400 16.6161 -248.6618 -89.6710 -0.7453 -0.5953
0.2426 5.6 700 0.1802 -0.6716 -17.3133 0.7400 16.6417 -248.7606 -89.5142 -0.7451 -0.5951
0.2599 6.0 750 0.1802 -0.6787 -17.2980 0.7400 16.6193 -248.6074 -89.5846 -0.7451 -0.5953
0.1213 6.4 800 0.1802 -0.6753 -17.3101 0.7400 16.6349 -248.7285 -89.5503 -0.7448 -0.5951
0.2426 6.8 850 0.1802 -0.6754 -17.3267 0.7400 16.6514 -248.8946 -89.5515 -0.7444 -0.5947
0.1733 7.2 900 0.1802 -0.6764 -17.3102 0.7400 16.6338 -248.7291 -89.5621 -0.7454 -0.5955
0.1386 7.6 950 0.1802 -0.6732 -17.3134 0.7400 16.6401 -248.7610 -89.5300 -0.7454 -0.5955
0.156 8.0 1000 0.1802 -0.6743 -17.3206 0.7400 16.6463 -248.8334 -89.5409 -0.7455 -0.5957

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1