--- license: llama3 base_model: tsavage68/Transaminitis_L3_1000rate_1e7_SFT tags: - trl - dpo - generated_from_trainer model-index: - name: Transaminitis_L3_1000steps_1e8rate_03beta_CSFTDPO results: [] --- # Transaminitis_L3_1000steps_1e8rate_03beta_CSFTDPO This model is a fine-tuned version of [tsavage68/Transaminitis_L3_1000rate_1e7_SFT](https://huggingface.co/tsavage68/Transaminitis_L3_1000rate_1e7_SFT) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.6938 - Rewards/chosen: -0.0044 - Rewards/rejected: -0.0034 - Rewards/accuracies: 0.4600 - Rewards/margins: -0.0010 - Logps/rejected: -18.5659 - Logps/chosen: -18.5488 - Logits/rejected: -1.0662 - Logits/chosen: -1.0650 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-08 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6931 | 0.2 | 25 | 0.6931 | 0.0003 | 0.0002 | 0.0100 | 0.0001 | -18.5542 | -18.5333 | -1.0657 | -1.0646 | | 0.6943 | 0.4 | 50 | 0.6935 | -0.0007 | -0.0003 | 0.4600 | -0.0004 | -18.5558 | -18.5367 | -1.0656 | -1.0644 | | 0.6957 | 0.6 | 75 | 0.6938 | 0.0071 | 0.0080 | 0.4400 | -0.0009 | -18.5281 | -18.5105 | -1.0654 | -1.0642 | | 0.6953 | 0.8 | 100 | 0.6954 | 0.0028 | 0.0069 | 0.4900 | -0.0041 | -18.5318 | -18.5250 | -1.0656 | -1.0645 | | 0.6934 | 1.0 | 125 | 0.6919 | 0.0084 | 0.0055 | 0.4600 | 0.0029 | -18.5364 | -18.5063 | -1.0643 | -1.0632 | | 0.6983 | 1.2 | 150 | 0.6927 | 0.0042 | 0.0030 | 0.4800 | 0.0012 | -18.5447 | -18.5201 | -1.0653 | -1.0641 | | 0.6949 | 1.4 | 175 | 0.6927 | 0.0065 | 0.0052 | 0.4800 | 0.0013 | -18.5374 | -18.5126 | -1.0657 | -1.0646 | | 0.6897 | 1.6 | 200 | 0.6935 | 0.0063 | 0.0067 | 0.4800 | -0.0004 | -18.5323 | -18.5132 | -1.0660 | -1.0649 | | 0.6935 | 1.8 | 225 | 0.6957 | -0.0013 | 0.0034 | 0.4300 | -0.0047 | -18.5435 | -18.5385 | -1.0650 | -1.0638 | | 0.6877 | 2.0 | 250 | 0.6932 | 0.0011 | 0.0007 | 0.4900 | 0.0003 | -18.5522 | -18.5307 | -1.0650 | -1.0639 | | 0.6916 | 2.2 | 275 | 0.6927 | 0.0018 | 0.0005 | 0.5700 | 0.0014 | -18.5532 | -18.5281 | -1.0656 | -1.0644 | | 0.6941 | 2.4 | 300 | 0.6901 | 0.0026 | -0.0038 | 0.5400 | 0.0064 | -18.5675 | -18.5257 | -1.0655 | -1.0644 | | 0.6912 | 2.6 | 325 | 0.6965 | 0.0013 | 0.0076 | 0.4200 | -0.0063 | -18.5294 | -18.5298 | -1.0658 | -1.0647 | | 0.6875 | 2.8 | 350 | 0.6923 | 0.0009 | -0.0013 | 0.5400 | 0.0022 | -18.5592 | -18.5313 | -1.0654 | -1.0644 | | 0.6921 | 3.0 | 375 | 0.6913 | 0.0116 | 0.0075 | 0.5300 | 0.0041 | -18.5297 | -18.4955 | -1.0657 | -1.0646 | | 0.6928 | 3.2 | 400 | 0.6960 | 0.0035 | 0.0087 | 0.4900 | -0.0052 | -18.5258 | -18.5226 | -1.0662 | -1.0649 | | 0.6945 | 3.4 | 425 | 0.6967 | 0.0049 | 0.0114 | 0.4600 | -0.0066 | -18.5165 | -18.5179 | -1.0654 | -1.0644 | | 0.6899 | 3.6 | 450 | 0.6943 | 0.0076 | 0.0096 | 0.4700 | -0.0020 | -18.5227 | -18.5089 | -1.0658 | -1.0646 | | 0.6933 | 3.8 | 475 | 0.6963 | 0.0045 | 0.0103 | 0.4500 | -0.0058 | -18.5204 | -18.5192 | -1.0651 | -1.0639 | | 0.6967 | 4.0 | 500 | 0.6936 | 0.0034 | 0.0039 | 0.5100 | -0.0006 | -18.5416 | -18.5230 | -1.0657 | -1.0645 | | 0.6915 | 4.2 | 525 | 0.6936 | -0.0023 | -0.0018 | 0.4600 | -0.0005 | -18.5607 | -18.5418 | -1.0658 | -1.0645 | | 0.6893 | 4.4 | 550 | 0.6946 | 0.0002 | 0.0027 | 0.5100 | -0.0026 | -18.5455 | -18.5337 | -1.0661 | -1.0648 | | 0.6938 | 4.6 | 575 | 0.6946 | 0.0070 | 0.0095 | 0.5700 | -0.0025 | -18.5230 | -18.5109 | -1.0659 | -1.0647 | | 0.6924 | 4.8 | 600 | 0.6959 | 0.0050 | 0.0101 | 0.4800 | -0.0051 | -18.5209 | -18.5174 | -1.0661 | -1.0650 | | 0.6878 | 5.0 | 625 | 0.6966 | 0.0041 | 0.0106 | 0.4500 | -0.0065 | -18.5193 | -18.5205 | -1.0660 | -1.0648 | | 0.6935 | 5.2 | 650 | 0.6949 | 0.0079 | 0.0110 | 0.5100 | -0.0031 | -18.5180 | -18.5079 | -1.0659 | -1.0649 | | 0.6953 | 5.4 | 675 | 0.6931 | 0.0071 | 0.0067 | 0.5100 | 0.0004 | -18.5325 | -18.5107 | -1.0662 | -1.0651 | | 0.6947 | 5.6 | 700 | 0.6945 | -0.0049 | -0.0026 | 0.5100 | -0.0023 | -18.5632 | -18.5505 | -1.0662 | -1.0651 | | 0.688 | 5.8 | 725 | 0.6933 | -0.0016 | -0.0017 | 0.4900 | 0.0001 | -18.5604 | -18.5394 | -1.0662 | -1.0650 | | 0.6953 | 6.0 | 750 | 0.6937 | -0.0044 | -0.0037 | 0.4600 | -0.0007 | -18.5670 | -18.5489 | -1.0662 | -1.0651 | | 0.6921 | 6.2 | 775 | 0.6933 | -0.0041 | -0.0042 | 0.4700 | 0.0001 | -18.5686 | -18.5478 | -1.0661 | -1.0650 | | 0.6942 | 6.4 | 800 | 0.6934 | -0.0030 | -0.0028 | 0.4700 | -0.0003 | -18.5640 | -18.5443 | -1.0661 | -1.0650 | | 0.69 | 6.6 | 825 | 0.6939 | -0.0039 | -0.0028 | 0.4600 | -0.0011 | -18.5640 | -18.5473 | -1.0662 | -1.0651 | | 0.6928 | 6.8 | 850 | 0.6938 | -0.0044 | -0.0034 | 0.4600 | -0.0010 | -18.5659 | -18.5488 | -1.0662 | -1.0650 | | 0.6957 | 7.0 | 875 | 0.6938 | -0.0044 | -0.0034 | 0.4600 | -0.0010 | -18.5659 | -18.5488 | -1.0662 | -1.0650 | | 0.697 | 7.2 | 900 | 0.6938 | -0.0044 | -0.0034 | 0.4600 | -0.0010 | -18.5659 | -18.5488 | -1.0662 | -1.0650 | | 0.693 | 7.4 | 925 | 0.6938 | -0.0044 | -0.0034 | 0.4600 | -0.0010 | -18.5659 | -18.5488 | -1.0662 | -1.0650 | | 0.6964 | 7.6 | 950 | 0.6938 | -0.0044 | -0.0034 | 0.4600 | -0.0010 | -18.5659 | -18.5488 | -1.0662 | -1.0650 | | 0.6934 | 7.8 | 975 | 0.6938 | -0.0044 | -0.0034 | 0.4600 | -0.0010 | -18.5659 | -18.5488 | -1.0662 | -1.0650 | | 0.6865 | 8.0 | 1000 | 0.6938 | -0.0044 | -0.0034 | 0.4600 | -0.0010 | -18.5659 | -18.5488 | -1.0662 | -1.0650 | ### Framework versions - Transformers 4.40.2 - Pytorch 2.0.0+cu117 - Datasets 2.19.1 - Tokenizers 0.19.1