--- license: apache-2.0 base_model: mistralai/Mistral-7B-Instruct-v0.2 tags: - trl - dpo - generated_from_trainer model-index: - name: mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO results: [] --- # mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9939 - Rewards/chosen: -3.9532 - Rewards/rejected: -5.6547 - Rewards/accuracies: 0.6000 - Rewards/margins: 1.7015 - Logps/rejected: -85.1197 - Logps/chosen: -62.9180 - Logits/rejected: -2.0229 - Logits/chosen: -2.0243 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 4 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6073 | 0.1 | 50 | 0.6623 | -1.2716 | -1.5743 | 0.5736 | 0.3026 | -44.3150 | -36.1020 | -2.8014 | -2.8019 | | 0.7223 | 0.2 | 100 | 0.7934 | -3.0203 | -3.2538 | 0.5077 | 0.2336 | -61.1108 | -53.5883 | -2.4237 | -2.4243 | | 0.8563 | 0.29 | 150 | 0.7580 | -1.8675 | -2.3470 | 0.5604 | 0.4795 | -52.0427 | -42.0607 | -2.5521 | -2.5529 | | 0.7701 | 0.39 | 200 | 0.7631 | -1.8702 | -2.1583 | 0.5231 | 0.2882 | -50.1556 | -42.0875 | -2.7052 | -2.7056 | | 0.8749 | 0.49 | 250 | 0.7941 | -2.4787 | -2.6066 | 0.4879 | 0.1279 | -54.6385 | -48.1731 | -2.8184 | -2.8189 | | 0.6954 | 0.59 | 300 | 0.8039 | -1.5721 | -1.9872 | 0.5473 | 0.4151 | -48.4439 | -39.1064 | -2.8263 | -2.8268 | | 0.733 | 0.68 | 350 | 0.7751 | -0.5753 | -1.0891 | 0.5253 | 0.5138 | -39.4632 | -29.1387 | -2.7587 | -2.7591 | | 0.8256 | 0.78 | 400 | 0.7376 | -1.2950 | -1.7911 | 0.5516 | 0.4962 | -46.4838 | -36.3354 | -2.9702 | -2.9707 | | 0.6485 | 0.88 | 450 | 0.7344 | -1.7798 | -2.3960 | 0.5692 | 0.6162 | -52.5322 | -41.1838 | -2.7167 | -2.7174 | | 0.612 | 0.98 | 500 | 0.7051 | -1.3500 | -2.0968 | 0.5978 | 0.7467 | -49.5400 | -36.8863 | -2.5131 | -2.5138 | | 0.2108 | 1.07 | 550 | 0.7799 | -2.0131 | -3.4580 | 0.6418 | 1.4449 | -63.1524 | -43.5171 | -2.2469 | -2.2482 | | 0.1378 | 1.17 | 600 | 0.9314 | -3.4717 | -5.1214 | 0.6198 | 1.6497 | -79.7863 | -58.1027 | -1.9917 | -1.9933 | | 0.188 | 1.27 | 650 | 0.9857 | -3.6647 | -5.3449 | 0.6198 | 1.6803 | -82.0219 | -60.0328 | -1.9585 | -1.9601 | | 0.3739 | 1.37 | 700 | 1.0046 | -3.6506 | -5.3352 | 0.6176 | 1.6846 | -81.9245 | -59.8915 | -2.0334 | -2.0349 | | 0.0428 | 1.46 | 750 | 0.9881 | -3.8094 | -5.4955 | 0.6088 | 1.6861 | -83.5278 | -61.4803 | -2.0272 | -2.0287 | | 0.131 | 1.56 | 800 | 0.9900 | -3.9653 | -5.6306 | 0.6022 | 1.6653 | -84.8782 | -63.0390 | -2.0228 | -2.0242 | | 0.1558 | 1.66 | 850 | 0.9943 | -3.9735 | -5.6628 | 0.6000 | 1.6893 | -85.2000 | -63.1207 | -2.0177 | -2.0191 | | 0.1876 | 1.76 | 900 | 0.9939 | -3.9576 | -5.6566 | 0.6000 | 1.6989 | -85.1381 | -62.9622 | -2.0227 | -2.0241 | | 0.1415 | 1.86 | 950 | 0.9945 | -3.9552 | -5.6536 | 0.6022 | 1.6984 | -85.1084 | -62.9377 | -2.0232 | -2.0246 | | 0.1163 | 1.95 | 1000 | 0.9939 | -3.9532 | -5.6547 | 0.6000 | 1.7015 | -85.1197 | -62.9180 | -2.0229 | -2.0243 | ### Framework versions - Transformers 4.38.2 - Pytorch 2.0.0+cu117 - Datasets 2.18.0 - Tokenizers 0.15.2