--- license: apache-2.0 base_model: tsavage68/mistralit2_1000_STEPS_5e7_SFT tags: - trl - dpo - generated_from_trainer model-index: - name: Mistral2_1000_STEPS_05beta_CDPOSFT results: [] --- # Mistral2_1000_STEPS_05beta_CDPOSFT This model is a fine-tuned version of [tsavage68/mistralit2_1000_STEPS_5e7_SFT](https://huggingface.co/tsavage68/mistralit2_1000_STEPS_5e7_SFT) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.5804 - Rewards/chosen: 1.1615 - Rewards/rejected: 0.9028 - Rewards/accuracies: 0.4286 - Rewards/margins: 0.2587 - Logps/rejected: -75.7158 - Logps/chosen: -73.1790 - Logits/rejected: -1.8951 - Logits/chosen: -1.8951 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 4 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 2.0435 | 0.0977 | 50 | 1.6546 | -0.2111 | -0.0540 | 0.3868 | -0.1572 | -77.6294 | -75.9242 | -1.6195 | -1.6195 | | 3.1098 | 0.1953 | 100 | 1.7215 | -0.1670 | -0.5209 | 0.4286 | 0.3539 | -78.5632 | -75.8359 | -0.7416 | -0.7416 | | 1.8949 | 0.2930 | 150 | 1.6841 | 2.1671 | 1.9786 | 0.4154 | 0.1886 | -73.5644 | -71.1677 | -1.7619 | -1.7619 | | 1.4406 | 0.3906 | 200 | 1.6936 | 2.3177 | 2.1054 | 0.4264 | 0.2124 | -73.3108 | -70.8665 | -2.2879 | -2.2879 | | 1.5623 | 0.4883 | 250 | 1.5911 | 0.8418 | 0.4811 | 0.4396 | 0.3607 | -76.5593 | -73.8184 | -1.5834 | -1.5834 | | 1.8884 | 0.5859 | 300 | 1.5747 | 1.4552 | 1.2105 | 0.4418 | 0.2447 | -75.1005 | -72.5916 | -1.6640 | -1.6640 | | 1.4373 | 0.6836 | 350 | 1.5569 | 1.3020 | 1.0909 | 0.4198 | 0.2111 | -75.3397 | -72.8979 | -1.9137 | -1.9136 | | 1.4732 | 0.7812 | 400 | 1.5216 | 1.0023 | 0.6676 | 0.4571 | 0.3347 | -76.1863 | -73.4973 | -1.9794 | -1.9794 | | 1.9109 | 0.8789 | 450 | 1.5502 | 1.3520 | 0.9986 | 0.4505 | 0.3534 | -75.5243 | -72.7979 | -1.8076 | -1.8076 | | 1.4744 | 0.9766 | 500 | 1.5531 | 1.3605 | 1.1014 | 0.4264 | 0.2591 | -75.3186 | -72.7809 | -1.9385 | -1.9385 | | 1.2615 | 1.0742 | 550 | 1.6623 | 0.6530 | 0.4114 | 0.4242 | 0.2415 | -76.6986 | -74.1960 | -2.3949 | -2.3949 | | 1.8019 | 1.1719 | 600 | 1.6240 | 0.8707 | 0.6200 | 0.4308 | 0.2507 | -76.2815 | -73.7606 | -1.6149 | -1.6149 | | 1.2202 | 1.2695 | 650 | 1.5993 | 1.1246 | 0.9014 | 0.4330 | 0.2233 | -75.7188 | -73.2527 | -1.8964 | -1.8964 | | 1.0924 | 1.3672 | 700 | 1.5922 | 1.3888 | 1.1674 | 0.4242 | 0.2214 | -75.1866 | -72.7243 | -1.8455 | -1.8455 | | 0.8059 | 1.4648 | 750 | 1.6004 | 1.1205 | 0.8834 | 0.4396 | 0.2371 | -75.7547 | -73.2610 | -1.9415 | -1.9415 | | 0.9489 | 1.5625 | 800 | 1.5917 | 1.2725 | 1.0232 | 0.4264 | 0.2493 | -75.4751 | -72.9570 | -1.9293 | -1.9293 | | 1.2564 | 1.6602 | 850 | 1.5797 | 1.1856 | 0.9286 | 0.4264 | 0.2570 | -75.6643 | -73.1308 | -1.8894 | -1.8894 | | 1.2613 | 1.7578 | 900 | 1.5806 | 1.1682 | 0.9110 | 0.4308 | 0.2572 | -75.6995 | -73.1655 | -1.8963 | -1.8963 | | 1.1197 | 1.8555 | 950 | 1.5804 | 1.1615 | 0.9030 | 0.4286 | 0.2585 | -75.7156 | -73.1791 | -1.8955 | -1.8955 | | 0.7665 | 1.9531 | 1000 | 1.5804 | 1.1615 | 0.9028 | 0.4286 | 0.2587 | -75.7158 | -73.1790 | -1.8951 | -1.8951 | ### Framework versions - Transformers 4.40.1 - Pytorch 2.0.0+cu117 - Datasets 2.19.0 - Tokenizers 0.19.1