metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO
results: []
mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.4935
- Rewards/chosen: -6.2215
- Rewards/rejected: -5.6448
- Rewards/accuracies: 0.3626
- Rewards/margins: -0.5767
- Logps/rejected: -85.0207
- Logps/chosen: -85.6008
- Logits/rejected: -5.8605
- Logits/chosen: -5.8604
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
1.1357 | 0.1 | 50 | 1.1734 | -1.9602 | -1.6509 | 0.3582 | -0.3094 | -45.0812 | -42.9883 | -3.1053 | -3.1052 |
1.7275 | 0.2 | 100 | 1.5539 | -4.8260 | -4.4502 | 0.3978 | -0.3758 | -73.0739 | -71.6456 | -2.7839 | -2.7839 |
1.6716 | 0.29 | 150 | 1.4805 | -4.2682 | -3.8441 | 0.3890 | -0.4241 | -67.0136 | -66.0676 | -3.8634 | -3.8634 |
1.9883 | 0.39 | 200 | 1.4624 | -4.1549 | -3.7121 | 0.3648 | -0.4429 | -65.6932 | -64.9352 | -4.6023 | -4.6023 |
1.2968 | 0.49 | 250 | 1.4720 | -4.1636 | -3.7323 | 0.3802 | -0.4312 | -65.8957 | -65.0215 | -4.0699 | -4.0699 |
1.5145 | 0.59 | 300 | 1.4656 | -4.1401 | -3.6836 | 0.3626 | -0.4564 | -65.4088 | -64.7864 | -4.8231 | -4.8231 |
1.7123 | 0.68 | 350 | 1.4617 | -4.1237 | -3.6671 | 0.3670 | -0.4567 | -65.2432 | -64.6233 | -4.7696 | -4.7696 |
1.295 | 0.78 | 400 | 1.4632 | -4.1764 | -3.7222 | 0.3714 | -0.4543 | -65.7941 | -65.1502 | -4.9799 | -4.9799 |
1.405 | 0.88 | 450 | 1.4666 | -4.1922 | -3.7464 | 0.3714 | -0.4458 | -66.0363 | -65.3076 | -5.0856 | -5.0856 |
1.9129 | 0.98 | 500 | 1.4701 | -4.2370 | -3.7742 | 0.3648 | -0.4628 | -66.3146 | -65.7560 | -5.1195 | -5.1195 |
1.2959 | 1.07 | 550 | 1.4889 | -4.3597 | -3.8796 | 0.3692 | -0.4802 | -67.3681 | -66.9833 | -5.1899 | -5.1899 |
1.2707 | 1.17 | 600 | 1.5193 | -4.6364 | -4.1231 | 0.3582 | -0.5133 | -69.8035 | -69.7498 | -5.9136 | -5.9136 |
1.3242 | 1.27 | 650 | 1.5168 | -4.6159 | -4.1101 | 0.3538 | -0.5057 | -69.6739 | -69.5444 | -5.3603 | -5.3603 |
1.397 | 1.37 | 700 | 2.1272 | -6.5216 | -6.2977 | 0.4022 | -0.2239 | -91.5493 | -88.6020 | -3.4923 | -3.4922 |
1.3107 | 1.46 | 750 | 1.4798 | -4.5654 | -4.0673 | 0.3626 | -0.4981 | -69.2450 | -69.0399 | -5.4624 | -5.4624 |
1.2491 | 1.56 | 800 | 1.4610 | -4.8769 | -4.3575 | 0.3648 | -0.5193 | -72.1476 | -72.1544 | -5.2893 | -5.2893 |
1.3924 | 1.66 | 850 | 1.4805 | -5.8437 | -5.2709 | 0.3473 | -0.5728 | -81.2817 | -81.8233 | -5.6057 | -5.6058 |
1.1725 | 1.76 | 900 | 1.4957 | -6.2498 | -5.6711 | 0.3626 | -0.5787 | -85.2834 | -85.8838 | -5.8532 | -5.8531 |
1.2113 | 1.86 | 950 | 1.4937 | -6.2249 | -5.6485 | 0.3626 | -0.5763 | -85.0578 | -85.6343 | -5.8631 | -5.8630 |
1.5057 | 1.95 | 1000 | 1.4935 | -6.2215 | -5.6448 | 0.3626 | -0.5767 | -85.0207 | -85.6008 | -5.8605 | -5.8604 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2