--- license: apache-2.0 base_model: mistralai/Mistral-7B-Instruct-v0.2 tags: - trl - dpo - generated_from_trainer model-index: - name: mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO results: [] --- # mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.3501 - Rewards/chosen: -4.6533 - Rewards/rejected: -7.2695 - Rewards/accuracies: 0.6044 - Rewards/margins: 2.6162 - Logps/rejected: -52.8039 - Logps/chosen: -38.8969 - Logits/rejected: -2.8818 - Logits/chosen: -2.8827 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 4 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6609 | 0.1 | 50 | 0.7439 | -0.3799 | -0.6639 | 0.5363 | 0.2840 | -30.7855 | -24.6521 | -2.8212 | -2.8215 | | 0.7223 | 0.2 | 100 | 1.2179 | -3.4197 | -4.5833 | 0.5670 | 1.1636 | -43.8500 | -34.7847 | -2.4935 | -2.4943 | | 1.5151 | 0.29 | 150 | 1.3451 | -4.6461 | -5.3198 | 0.4923 | 0.6737 | -46.3050 | -38.8727 | -2.7810 | -2.7816 | | 1.5249 | 0.39 | 200 | 1.5370 | -4.3700 | -4.3686 | 0.4659 | -0.0014 | -43.1345 | -37.9527 | -2.9607 | -2.9612 | | 1.3975 | 0.49 | 250 | 1.2806 | -3.4083 | -3.9853 | 0.5319 | 0.5769 | -41.8567 | -34.7470 | -2.9314 | -2.9319 | | 1.3304 | 0.59 | 300 | 1.3357 | -2.0104 | -2.3692 | 0.4945 | 0.3588 | -36.4698 | -30.0870 | -2.9631 | -2.9635 | | 1.0439 | 0.68 | 350 | 1.2763 | -0.5270 | -0.8889 | 0.5077 | 0.3619 | -31.5354 | -25.1425 | -2.8440 | -2.8443 | | 1.4598 | 0.78 | 400 | 1.2025 | -2.3552 | -3.1289 | 0.5560 | 0.7737 | -39.0019 | -31.2365 | -3.1671 | -3.1675 | | 0.8046 | 0.88 | 450 | 1.2610 | -2.5219 | -3.3122 | 0.5538 | 0.7903 | -39.6132 | -31.7922 | -2.8903 | -2.8908 | | 0.9395 | 0.98 | 500 | 1.1880 | -1.6006 | -2.5141 | 0.5451 | 0.9135 | -36.9527 | -28.7210 | -2.7295 | -2.7300 | | 0.239 | 1.07 | 550 | 1.1556 | -2.0692 | -3.6279 | 0.5868 | 1.5587 | -40.6656 | -30.2832 | -2.8301 | -2.8308 | | 0.1348 | 1.17 | 600 | 1.3248 | -3.6765 | -5.8923 | 0.5978 | 2.2158 | -48.2133 | -35.6409 | -2.8392 | -2.8400 | | 0.328 | 1.27 | 650 | 1.2982 | -3.5842 | -5.5884 | 0.5868 | 2.0042 | -47.2005 | -35.3331 | -2.8786 | -2.8794 | | 0.3605 | 1.37 | 700 | 1.2960 | -4.0655 | -6.4030 | 0.6000 | 2.3374 | -49.9156 | -36.9376 | -2.8812 | -2.8820 | | 0.1389 | 1.46 | 750 | 1.3185 | -4.2670 | -6.7599 | 0.5956 | 2.4929 | -51.1054 | -37.6093 | -2.8897 | -2.8905 | | 0.1871 | 1.56 | 800 | 1.3483 | -4.5542 | -7.1419 | 0.5978 | 2.5877 | -52.3788 | -38.5665 | -2.8779 | -2.8788 | | 0.3556 | 1.66 | 850 | 1.3507 | -4.6209 | -7.2288 | 0.6000 | 2.6080 | -52.6684 | -38.7887 | -2.8809 | -2.8817 | | 0.4099 | 1.76 | 900 | 1.3517 | -4.6482 | -7.2597 | 0.6022 | 2.6114 | -52.7713 | -38.8799 | -2.8817 | -2.8826 | | 0.3996 | 1.86 | 950 | 1.3491 | -4.6540 | -7.2682 | 0.6044 | 2.6142 | -52.7997 | -38.8992 | -2.8818 | -2.8827 | | 0.2013 | 1.95 | 1000 | 1.3501 | -4.6533 | -7.2695 | 0.6044 | 2.6162 | -52.8039 | -38.8969 | -2.8818 | -2.8827 | ### Framework versions - Transformers 4.38.2 - Pytorch 2.0.0+cu117 - Datasets 2.18.0 - Tokenizers 0.15.2