tsavage68's picture
End of training
921670c verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO
    results: []

mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9939
  • Rewards/chosen: -3.9532
  • Rewards/rejected: -5.6547
  • Rewards/accuracies: 0.6000
  • Rewards/margins: 1.7015
  • Logps/rejected: -85.1197
  • Logps/chosen: -62.9180
  • Logits/rejected: -2.0229
  • Logits/chosen: -2.0243

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6073 0.1 50 0.6623 -1.2716 -1.5743 0.5736 0.3026 -44.3150 -36.1020 -2.8014 -2.8019
0.7223 0.2 100 0.7934 -3.0203 -3.2538 0.5077 0.2336 -61.1108 -53.5883 -2.4237 -2.4243
0.8563 0.29 150 0.7580 -1.8675 -2.3470 0.5604 0.4795 -52.0427 -42.0607 -2.5521 -2.5529
0.7701 0.39 200 0.7631 -1.8702 -2.1583 0.5231 0.2882 -50.1556 -42.0875 -2.7052 -2.7056
0.8749 0.49 250 0.7941 -2.4787 -2.6066 0.4879 0.1279 -54.6385 -48.1731 -2.8184 -2.8189
0.6954 0.59 300 0.8039 -1.5721 -1.9872 0.5473 0.4151 -48.4439 -39.1064 -2.8263 -2.8268
0.733 0.68 350 0.7751 -0.5753 -1.0891 0.5253 0.5138 -39.4632 -29.1387 -2.7587 -2.7591
0.8256 0.78 400 0.7376 -1.2950 -1.7911 0.5516 0.4962 -46.4838 -36.3354 -2.9702 -2.9707
0.6485 0.88 450 0.7344 -1.7798 -2.3960 0.5692 0.6162 -52.5322 -41.1838 -2.7167 -2.7174
0.612 0.98 500 0.7051 -1.3500 -2.0968 0.5978 0.7467 -49.5400 -36.8863 -2.5131 -2.5138
0.2108 1.07 550 0.7799 -2.0131 -3.4580 0.6418 1.4449 -63.1524 -43.5171 -2.2469 -2.2482
0.1378 1.17 600 0.9314 -3.4717 -5.1214 0.6198 1.6497 -79.7863 -58.1027 -1.9917 -1.9933
0.188 1.27 650 0.9857 -3.6647 -5.3449 0.6198 1.6803 -82.0219 -60.0328 -1.9585 -1.9601
0.3739 1.37 700 1.0046 -3.6506 -5.3352 0.6176 1.6846 -81.9245 -59.8915 -2.0334 -2.0349
0.0428 1.46 750 0.9881 -3.8094 -5.4955 0.6088 1.6861 -83.5278 -61.4803 -2.0272 -2.0287
0.131 1.56 800 0.9900 -3.9653 -5.6306 0.6022 1.6653 -84.8782 -63.0390 -2.0228 -2.0242
0.1558 1.66 850 0.9943 -3.9735 -5.6628 0.6000 1.6893 -85.2000 -63.1207 -2.0177 -2.0191
0.1876 1.76 900 0.9939 -3.9576 -5.6566 0.6000 1.6989 -85.1381 -62.9622 -2.0227 -2.0241
0.1415 1.86 950 0.9945 -3.9552 -5.6536 0.6022 1.6984 -85.1084 -62.9377 -2.0232 -2.0246
0.1163 1.95 1000 0.9939 -3.9532 -5.6547 0.6000 1.7015 -85.1197 -62.9180 -2.0229 -2.0243

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2