metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Na_M2_1000steps_1e6rate_05beta_cSFTDPO
results: []
Na_M2_1000steps_1e6rate_05beta_cSFTDPO
This model is a fine-tuned version of tsavage68/Na_M2_1000steps_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: 3.9080
- Rewards/rejected: -14.8092
- Rewards/accuracies: 1.0
- Rewards/margins: 18.7172
- Logps/rejected: -109.5417
- Logps/chosen: -40.3163
- Logits/rejected: -2.5104
- Logits/chosen: -2.5251
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0 | 0.2667 | 50 | 0.0000 | 2.9065 | -11.8929 | 1.0 | 14.7994 | -103.7091 | -42.3194 | -2.5220 | -2.5359 |
0.0 | 0.5333 | 100 | 0.0000 | 3.3215 | -13.0434 | 1.0 | 16.3649 | -106.0102 | -41.4894 | -2.5188 | -2.5330 |
0.0 | 0.8 | 150 | 0.0000 | 3.5466 | -13.5189 | 1.0 | 17.0655 | -106.9612 | -41.0391 | -2.5174 | -2.5317 |
0.0 | 1.0667 | 200 | 0.0000 | 3.6119 | -13.8627 | 1.0 | 17.4745 | -107.6487 | -40.9087 | -2.5143 | -2.5288 |
0.0 | 1.3333 | 250 | 0.0000 | 3.7085 | -14.0115 | 1.0 | 17.7200 | -107.9463 | -40.7154 | -2.5151 | -2.5296 |
0.0 | 1.6 | 300 | 0.0000 | 3.7952 | -14.1247 | 1.0 | 17.9199 | -108.1728 | -40.5420 | -2.5141 | -2.5286 |
0.0 | 1.8667 | 350 | 0.0000 | 3.7740 | -14.2878 | 1.0 | 18.0618 | -108.4989 | -40.5843 | -2.5139 | -2.5284 |
0.0 | 2.1333 | 400 | 0.0000 | 3.8254 | -14.4626 | 1.0 | 18.2880 | -108.8486 | -40.4816 | -2.5124 | -2.5269 |
0.0 | 2.4 | 450 | 0.0000 | 3.8372 | -14.5044 | 1.0 | 18.3416 | -108.9322 | -40.4579 | -2.5127 | -2.5273 |
0.0 | 2.6667 | 500 | 0.0000 | 3.8544 | -14.6284 | 1.0 | 18.4828 | -109.1802 | -40.4237 | -2.5115 | -2.5260 |
0.0 | 2.9333 | 550 | 0.0000 | 3.8744 | -14.6609 | 1.0 | 18.5353 | -109.2451 | -40.3835 | -2.5116 | -2.5262 |
0.0 | 3.2 | 600 | 0.0000 | 3.9000 | -14.7002 | 1.0 | 18.6002 | -109.3238 | -40.3324 | -2.5103 | -2.5249 |
0.0 | 3.4667 | 650 | 0.0000 | 3.9168 | -14.7537 | 1.0 | 18.6705 | -109.4308 | -40.2988 | -2.5105 | -2.5252 |
0.0 | 3.7333 | 700 | 0.0000 | 3.9175 | -14.7437 | 1.0 | 18.6612 | -109.4108 | -40.2974 | -2.5112 | -2.5259 |
0.0 | 4.0 | 750 | 0.0000 | 3.9128 | -14.7841 | 1.0 | 18.6969 | -109.4916 | -40.3068 | -2.5104 | -2.5250 |
0.0 | 4.2667 | 800 | 0.0000 | 3.9063 | -14.7726 | 1.0 | 18.6788 | -109.4685 | -40.3198 | -2.5107 | -2.5253 |
0.0 | 4.5333 | 850 | 0.0000 | 3.9178 | -14.7925 | 1.0 | 18.7103 | -109.5084 | -40.2969 | -2.5104 | -2.5251 |
0.0 | 4.8 | 900 | 0.0000 | 3.9074 | -14.8080 | 1.0 | 18.7154 | -109.5393 | -40.3176 | -2.5104 | -2.5251 |
0.0 | 5.0667 | 950 | 0.0000 | 3.9080 | -14.8092 | 1.0 | 18.7172 | -109.5417 | -40.3163 | -2.5104 | -2.5251 |
0.0 | 5.3333 | 1000 | 0.0000 | 3.9080 | -14.8092 | 1.0 | 18.7172 | -109.5417 | -40.3163 | -2.5104 | -2.5251 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1