--- license: apache-2.0 base_model: mistralai/Mistral-7B-Instruct-v0.2 tags: - trl - dpo - generated_from_trainer model-index: - name: mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO results: [] --- # mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.4935 - Rewards/chosen: -6.2215 - Rewards/rejected: -5.6448 - Rewards/accuracies: 0.3626 - Rewards/margins: -0.5767 - Logps/rejected: -85.0207 - Logps/chosen: -85.6008 - Logits/rejected: -5.8605 - Logits/chosen: -5.8604 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 4 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 1.1357 | 0.1 | 50 | 1.1734 | -1.9602 | -1.6509 | 0.3582 | -0.3094 | -45.0812 | -42.9883 | -3.1053 | -3.1052 | | 1.7275 | 0.2 | 100 | 1.5539 | -4.8260 | -4.4502 | 0.3978 | -0.3758 | -73.0739 | -71.6456 | -2.7839 | -2.7839 | | 1.6716 | 0.29 | 150 | 1.4805 | -4.2682 | -3.8441 | 0.3890 | -0.4241 | -67.0136 | -66.0676 | -3.8634 | -3.8634 | | 1.9883 | 0.39 | 200 | 1.4624 | -4.1549 | -3.7121 | 0.3648 | -0.4429 | -65.6932 | -64.9352 | -4.6023 | -4.6023 | | 1.2968 | 0.49 | 250 | 1.4720 | -4.1636 | -3.7323 | 0.3802 | -0.4312 | -65.8957 | -65.0215 | -4.0699 | -4.0699 | | 1.5145 | 0.59 | 300 | 1.4656 | -4.1401 | -3.6836 | 0.3626 | -0.4564 | -65.4088 | -64.7864 | -4.8231 | -4.8231 | | 1.7123 | 0.68 | 350 | 1.4617 | -4.1237 | -3.6671 | 0.3670 | -0.4567 | -65.2432 | -64.6233 | -4.7696 | -4.7696 | | 1.295 | 0.78 | 400 | 1.4632 | -4.1764 | -3.7222 | 0.3714 | -0.4543 | -65.7941 | -65.1502 | -4.9799 | -4.9799 | | 1.405 | 0.88 | 450 | 1.4666 | -4.1922 | -3.7464 | 0.3714 | -0.4458 | -66.0363 | -65.3076 | -5.0856 | -5.0856 | | 1.9129 | 0.98 | 500 | 1.4701 | -4.2370 | -3.7742 | 0.3648 | -0.4628 | -66.3146 | -65.7560 | -5.1195 | -5.1195 | | 1.2959 | 1.07 | 550 | 1.4889 | -4.3597 | -3.8796 | 0.3692 | -0.4802 | -67.3681 | -66.9833 | -5.1899 | -5.1899 | | 1.2707 | 1.17 | 600 | 1.5193 | -4.6364 | -4.1231 | 0.3582 | -0.5133 | -69.8035 | -69.7498 | -5.9136 | -5.9136 | | 1.3242 | 1.27 | 650 | 1.5168 | -4.6159 | -4.1101 | 0.3538 | -0.5057 | -69.6739 | -69.5444 | -5.3603 | -5.3603 | | 1.397 | 1.37 | 700 | 2.1272 | -6.5216 | -6.2977 | 0.4022 | -0.2239 | -91.5493 | -88.6020 | -3.4923 | -3.4922 | | 1.3107 | 1.46 | 750 | 1.4798 | -4.5654 | -4.0673 | 0.3626 | -0.4981 | -69.2450 | -69.0399 | -5.4624 | -5.4624 | | 1.2491 | 1.56 | 800 | 1.4610 | -4.8769 | -4.3575 | 0.3648 | -0.5193 | -72.1476 | -72.1544 | -5.2893 | -5.2893 | | 1.3924 | 1.66 | 850 | 1.4805 | -5.8437 | -5.2709 | 0.3473 | -0.5728 | -81.2817 | -81.8233 | -5.6057 | -5.6058 | | 1.1725 | 1.76 | 900 | 1.4957 | -6.2498 | -5.6711 | 0.3626 | -0.5787 | -85.2834 | -85.8838 | -5.8532 | -5.8531 | | 1.2113 | 1.86 | 950 | 1.4937 | -6.2249 | -5.6485 | 0.3626 | -0.5763 | -85.0578 | -85.6343 | -5.8631 | -5.8630 | | 1.5057 | 1.95 | 1000 | 1.4935 | -6.2215 | -5.6448 | 0.3626 | -0.5767 | -85.0207 | -85.6008 | -5.8605 | -5.8604 | ### Framework versions - Transformers 4.38.2 - Pytorch 2.0.0+cu117 - Datasets 2.18.0 - Tokenizers 0.15.2