tsavage68's picture
End of training
6582139 verified
metadata
license: apache-2.0
base_model: mosaicml/mpt-7b-instruct
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: MPT_1000_STEPS_1e5_rate_01_beta_DPO
    results: []

MPT_1000_STEPS_1e5_rate_01_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8946
  • Rewards/chosen: -4.4962
  • Rewards/rejected: -4.4462
  • Rewards/accuracies: 0.4901
  • Rewards/margins: -0.0501
  • Logps/rejected: -66.0193
  • Logps/chosen: -65.7547
  • Logits/rejected: 8.4623
  • Logits/chosen: 8.4615

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7056 0.05 50 0.9054 -1.8795 -1.8769 0.4857 -0.0027 -40.3261 -39.5876 13.2447 13.2474
1.3284 0.1 100 1.3365 -5.2198 -5.1996 0.4835 -0.0202 -73.5531 -72.9898 40.0297 40.0297
4.0395 0.15 150 1.2940 -5.6920 -5.6131 0.4637 -0.0789 -77.6884 -77.7120 34.5576 34.5577
1.1998 0.2 200 1.1437 -4.4153 -4.3103 0.4747 -0.1050 -64.6601 -64.9452 14.5309 14.5309
1.0001 0.24 250 1.3580 -5.0983 -5.0232 0.5033 -0.0751 -71.7890 -71.7751 24.0739 24.0735
1.1726 0.29 300 1.0394 -4.1980 -4.0831 0.4879 -0.1149 -62.3888 -62.7721 16.4743 16.4742
1.0955 0.34 350 1.0584 -4.9210 -4.7783 0.4747 -0.1427 -69.3404 -70.0020 20.7178 20.7172
1.2598 0.39 400 1.0408 -3.8776 -3.8210 0.4945 -0.0566 -59.7678 -59.5681 17.0600 17.0587
1.2403 0.44 450 0.9855 -4.8112 -4.6991 0.4747 -0.1121 -68.5488 -68.9046 10.9237 10.9226
1.2967 0.49 500 0.9814 -4.7410 -4.6563 0.4769 -0.0846 -68.1207 -68.2017 15.1832 15.1825
1.152 0.54 550 0.9258 -4.6800 -4.6273 0.4989 -0.0527 -67.8303 -67.5925 9.7415 9.7409
0.9473 0.59 600 0.9416 -3.6301 -3.6600 0.5341 0.0299 -58.1573 -57.0931 10.5794 10.5787
0.9534 0.64 650 0.9361 -4.7539 -4.6806 0.4681 -0.0733 -68.3630 -68.3308 11.2450 11.2442
0.985 0.68 700 0.9194 -4.5437 -4.5232 0.5011 -0.0205 -66.7896 -66.2292 9.1942 9.1934
0.97 0.73 750 0.9090 -4.6508 -4.5989 0.4835 -0.0520 -67.5462 -67.3006 8.0813 8.0806
0.8148 0.78 800 0.8992 -4.5695 -4.5180 0.4923 -0.0515 -66.7373 -66.4875 8.3458 8.3450
0.9668 0.83 850 0.8976 -4.5172 -4.4650 0.4901 -0.0521 -66.2078 -65.9638 8.2885 8.2877
0.9438 0.88 900 0.8952 -4.4950 -4.4441 0.4923 -0.0509 -65.9988 -65.7424 8.4833 8.4825
1.0069 0.93 950 0.8954 -4.4971 -4.4461 0.4901 -0.0510 -66.0188 -65.7634 8.4615 8.4607
0.7377 0.98 1000 0.8946 -4.4962 -4.4462 0.4901 -0.0501 -66.0193 -65.7547 8.4623 8.4615

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2