--- library_name: transformers license: apache-2.0 base_model: tsavage68/Na_M2_1000steps_1e7_SFT tags: - trl - dpo - generated_from_trainer model-index: - name: Na_M2_1000steps_1e6rate_05beta_cSFTDPO results: [] --- # Na_M2_1000steps_1e6rate_05beta_cSFTDPO This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.0000 - Rewards/chosen: 3.9080 - Rewards/rejected: -14.8092 - Rewards/accuracies: 1.0 - Rewards/margins: 18.7172 - Logps/rejected: -109.5417 - Logps/chosen: -40.3163 - Logits/rejected: -2.5104 - Logits/chosen: -2.5251 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.0 | 0.2667 | 50 | 0.0000 | 2.9065 | -11.8929 | 1.0 | 14.7994 | -103.7091 | -42.3194 | -2.5220 | -2.5359 | | 0.0 | 0.5333 | 100 | 0.0000 | 3.3215 | -13.0434 | 1.0 | 16.3649 | -106.0102 | -41.4894 | -2.5188 | -2.5330 | | 0.0 | 0.8 | 150 | 0.0000 | 3.5466 | -13.5189 | 1.0 | 17.0655 | -106.9612 | -41.0391 | -2.5174 | -2.5317 | | 0.0 | 1.0667 | 200 | 0.0000 | 3.6119 | -13.8627 | 1.0 | 17.4745 | -107.6487 | -40.9087 | -2.5143 | -2.5288 | | 0.0 | 1.3333 | 250 | 0.0000 | 3.7085 | -14.0115 | 1.0 | 17.7200 | -107.9463 | -40.7154 | -2.5151 | -2.5296 | | 0.0 | 1.6 | 300 | 0.0000 | 3.7952 | -14.1247 | 1.0 | 17.9199 | -108.1728 | -40.5420 | -2.5141 | -2.5286 | | 0.0 | 1.8667 | 350 | 0.0000 | 3.7740 | -14.2878 | 1.0 | 18.0618 | -108.4989 | -40.5843 | -2.5139 | -2.5284 | | 0.0 | 2.1333 | 400 | 0.0000 | 3.8254 | -14.4626 | 1.0 | 18.2880 | -108.8486 | -40.4816 | -2.5124 | -2.5269 | | 0.0 | 2.4 | 450 | 0.0000 | 3.8372 | -14.5044 | 1.0 | 18.3416 | -108.9322 | -40.4579 | -2.5127 | -2.5273 | | 0.0 | 2.6667 | 500 | 0.0000 | 3.8544 | -14.6284 | 1.0 | 18.4828 | -109.1802 | -40.4237 | -2.5115 | -2.5260 | | 0.0 | 2.9333 | 550 | 0.0000 | 3.8744 | -14.6609 | 1.0 | 18.5353 | -109.2451 | -40.3835 | -2.5116 | -2.5262 | | 0.0 | 3.2 | 600 | 0.0000 | 3.9000 | -14.7002 | 1.0 | 18.6002 | -109.3238 | -40.3324 | -2.5103 | -2.5249 | | 0.0 | 3.4667 | 650 | 0.0000 | 3.9168 | -14.7537 | 1.0 | 18.6705 | -109.4308 | -40.2988 | -2.5105 | -2.5252 | | 0.0 | 3.7333 | 700 | 0.0000 | 3.9175 | -14.7437 | 1.0 | 18.6612 | -109.4108 | -40.2974 | -2.5112 | -2.5259 | | 0.0 | 4.0 | 750 | 0.0000 | 3.9128 | -14.7841 | 1.0 | 18.6969 | -109.4916 | -40.3068 | -2.5104 | -2.5250 | | 0.0 | 4.2667 | 800 | 0.0000 | 3.9063 | -14.7726 | 1.0 | 18.6788 | -109.4685 | -40.3198 | -2.5107 | -2.5253 | | 0.0 | 4.5333 | 850 | 0.0000 | 3.9178 | -14.7925 | 1.0 | 18.7103 | -109.5084 | -40.2969 | -2.5104 | -2.5251 | | 0.0 | 4.8 | 900 | 0.0000 | 3.9074 | -14.8080 | 1.0 | 18.7154 | -109.5393 | -40.3176 | -2.5104 | -2.5251 | | 0.0 | 5.0667 | 950 | 0.0000 | 3.9080 | -14.8092 | 1.0 | 18.7172 | -109.5417 | -40.3163 | -2.5104 | -2.5251 | | 0.0 | 5.3333 | 1000 | 0.0000 | 3.9080 | -14.8092 | 1.0 | 18.7172 | -109.5417 | -40.3163 | -2.5104 | -2.5251 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.4.0+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1