tsavage68's picture
End of training
201e2c5 verified
|
raw
history blame
5.79 kB
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO
    results: []

mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4935
  • Rewards/chosen: -6.2215
  • Rewards/rejected: -5.6448
  • Rewards/accuracies: 0.3626
  • Rewards/margins: -0.5767
  • Logps/rejected: -85.0207
  • Logps/chosen: -85.6008
  • Logits/rejected: -5.8605
  • Logits/chosen: -5.8604

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.1357 0.1 50 1.1734 -1.9602 -1.6509 0.3582 -0.3094 -45.0812 -42.9883 -3.1053 -3.1052
1.7275 0.2 100 1.5539 -4.8260 -4.4502 0.3978 -0.3758 -73.0739 -71.6456 -2.7839 -2.7839
1.6716 0.29 150 1.4805 -4.2682 -3.8441 0.3890 -0.4241 -67.0136 -66.0676 -3.8634 -3.8634
1.9883 0.39 200 1.4624 -4.1549 -3.7121 0.3648 -0.4429 -65.6932 -64.9352 -4.6023 -4.6023
1.2968 0.49 250 1.4720 -4.1636 -3.7323 0.3802 -0.4312 -65.8957 -65.0215 -4.0699 -4.0699
1.5145 0.59 300 1.4656 -4.1401 -3.6836 0.3626 -0.4564 -65.4088 -64.7864 -4.8231 -4.8231
1.7123 0.68 350 1.4617 -4.1237 -3.6671 0.3670 -0.4567 -65.2432 -64.6233 -4.7696 -4.7696
1.295 0.78 400 1.4632 -4.1764 -3.7222 0.3714 -0.4543 -65.7941 -65.1502 -4.9799 -4.9799
1.405 0.88 450 1.4666 -4.1922 -3.7464 0.3714 -0.4458 -66.0363 -65.3076 -5.0856 -5.0856
1.9129 0.98 500 1.4701 -4.2370 -3.7742 0.3648 -0.4628 -66.3146 -65.7560 -5.1195 -5.1195
1.2959 1.07 550 1.4889 -4.3597 -3.8796 0.3692 -0.4802 -67.3681 -66.9833 -5.1899 -5.1899
1.2707 1.17 600 1.5193 -4.6364 -4.1231 0.3582 -0.5133 -69.8035 -69.7498 -5.9136 -5.9136
1.3242 1.27 650 1.5168 -4.6159 -4.1101 0.3538 -0.5057 -69.6739 -69.5444 -5.3603 -5.3603
1.397 1.37 700 2.1272 -6.5216 -6.2977 0.4022 -0.2239 -91.5493 -88.6020 -3.4923 -3.4922
1.3107 1.46 750 1.4798 -4.5654 -4.0673 0.3626 -0.4981 -69.2450 -69.0399 -5.4624 -5.4624
1.2491 1.56 800 1.4610 -4.8769 -4.3575 0.3648 -0.5193 -72.1476 -72.1544 -5.2893 -5.2893
1.3924 1.66 850 1.4805 -5.8437 -5.2709 0.3473 -0.5728 -81.2817 -81.8233 -5.6057 -5.6058
1.1725 1.76 900 1.4957 -6.2498 -5.6711 0.3626 -0.5787 -85.2834 -85.8838 -5.8532 -5.8531
1.2113 1.86 950 1.4937 -6.2249 -5.6485 0.3626 -0.5763 -85.0578 -85.6343 -5.8631 -5.8630
1.5057 1.95 1000 1.4935 -6.2215 -5.6448 0.3626 -0.5767 -85.0207 -85.6008 -5.8605 -5.8604

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2