metadata
license: apache-2.0
base_model: tsavage68/mistralit2_1000_STEPS_5e7_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Mistral2_1000_STEPS_05beta_CDPOSFT
results: []
Mistral2_1000_STEPS_05beta_CDPOSFT
This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.5804
- Rewards/chosen: 1.1615
- Rewards/rejected: 0.9028
- Rewards/accuracies: 0.4286
- Rewards/margins: 0.2587
- Logps/rejected: -75.7158
- Logps/chosen: -73.1790
- Logits/rejected: -1.8951
- Logits/chosen: -1.8951
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
2.0435 | 0.0977 | 50 | 1.6546 | -0.2111 | -0.0540 | 0.3868 | -0.1572 | -77.6294 | -75.9242 | -1.6195 | -1.6195 |
3.1098 | 0.1953 | 100 | 1.7215 | -0.1670 | -0.5209 | 0.4286 | 0.3539 | -78.5632 | -75.8359 | -0.7416 | -0.7416 |
1.8949 | 0.2930 | 150 | 1.6841 | 2.1671 | 1.9786 | 0.4154 | 0.1886 | -73.5644 | -71.1677 | -1.7619 | -1.7619 |
1.4406 | 0.3906 | 200 | 1.6936 | 2.3177 | 2.1054 | 0.4264 | 0.2124 | -73.3108 | -70.8665 | -2.2879 | -2.2879 |
1.5623 | 0.4883 | 250 | 1.5911 | 0.8418 | 0.4811 | 0.4396 | 0.3607 | -76.5593 | -73.8184 | -1.5834 | -1.5834 |
1.8884 | 0.5859 | 300 | 1.5747 | 1.4552 | 1.2105 | 0.4418 | 0.2447 | -75.1005 | -72.5916 | -1.6640 | -1.6640 |
1.4373 | 0.6836 | 350 | 1.5569 | 1.3020 | 1.0909 | 0.4198 | 0.2111 | -75.3397 | -72.8979 | -1.9137 | -1.9136 |
1.4732 | 0.7812 | 400 | 1.5216 | 1.0023 | 0.6676 | 0.4571 | 0.3347 | -76.1863 | -73.4973 | -1.9794 | -1.9794 |
1.9109 | 0.8789 | 450 | 1.5502 | 1.3520 | 0.9986 | 0.4505 | 0.3534 | -75.5243 | -72.7979 | -1.8076 | -1.8076 |
1.4744 | 0.9766 | 500 | 1.5531 | 1.3605 | 1.1014 | 0.4264 | 0.2591 | -75.3186 | -72.7809 | -1.9385 | -1.9385 |
1.2615 | 1.0742 | 550 | 1.6623 | 0.6530 | 0.4114 | 0.4242 | 0.2415 | -76.6986 | -74.1960 | -2.3949 | -2.3949 |
1.8019 | 1.1719 | 600 | 1.6240 | 0.8707 | 0.6200 | 0.4308 | 0.2507 | -76.2815 | -73.7606 | -1.6149 | -1.6149 |
1.2202 | 1.2695 | 650 | 1.5993 | 1.1246 | 0.9014 | 0.4330 | 0.2233 | -75.7188 | -73.2527 | -1.8964 | -1.8964 |
1.0924 | 1.3672 | 700 | 1.5922 | 1.3888 | 1.1674 | 0.4242 | 0.2214 | -75.1866 | -72.7243 | -1.8455 | -1.8455 |
0.8059 | 1.4648 | 750 | 1.6004 | 1.1205 | 0.8834 | 0.4396 | 0.2371 | -75.7547 | -73.2610 | -1.9415 | -1.9415 |
0.9489 | 1.5625 | 800 | 1.5917 | 1.2725 | 1.0232 | 0.4264 | 0.2493 | -75.4751 | -72.9570 | -1.9293 | -1.9293 |
1.2564 | 1.6602 | 850 | 1.5797 | 1.1856 | 0.9286 | 0.4264 | 0.2570 | -75.6643 | -73.1308 | -1.8894 | -1.8894 |
1.2613 | 1.7578 | 900 | 1.5806 | 1.1682 | 0.9110 | 0.4308 | 0.2572 | -75.6995 | -73.1655 | -1.8963 | -1.8963 |
1.1197 | 1.8555 | 950 | 1.5804 | 1.1615 | 0.9030 | 0.4286 | 0.2585 | -75.7156 | -73.1791 | -1.8955 | -1.8955 |
0.7665 | 1.9531 | 1000 | 1.5804 | 1.1615 | 0.9028 | 0.4286 | 0.2587 | -75.7158 | -73.1790 | -1.8951 | -1.8951 |
Framework versions
- Transformers 4.40.1
- Pytorch 2.0.0+cu117
- Datasets 2.19.0
- Tokenizers 0.19.1