merged_model_dpo
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: 0.2797
- Rewards/rejected: -17.5881
- Rewards/accuracies: 1.0
- Rewards/margins: 17.8678
- Logps/rejected: -299.3185
- Logps/chosen: -28.7786
- Logits/rejected: -3.7935
- Logits/chosen: -4.0383
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 50
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6218 | 0.21 | 10 | 0.2547 | 0.5225 | -0.8859 | 1.0 | 1.4085 | -132.2969 | -26.3501 | -3.7422 | -3.8380 |
0.1216 | 0.43 | 20 | 0.0055 | 0.6847 | -5.7740 | 1.0 | 6.4587 | -181.1776 | -24.7284 | -3.7620 | -3.9325 |
0.0074 | 0.64 | 30 | 0.0000 | 0.4694 | -13.1598 | 1.0 | 13.6292 | -255.0354 | -26.8815 | -3.7881 | -4.0116 |
0.0001 | 0.85 | 40 | 0.0000 | 0.3177 | -16.7606 | 1.0 | 17.0783 | -291.0435 | -28.3980 | -3.7933 | -4.0344 |
0.0001 | 1.06 | 50 | 0.0000 | 0.2797 | -17.5881 | 1.0 | 17.8678 | -299.3185 | -28.7786 | -3.7935 | -4.0383 |
Framework versions
- PEFT 0.7.2.dev0
- Transformers 4.37.0.dev0
- Pytorch 2.1.0+cu118
- Datasets 2.16.0
- Tokenizers 0.15.0
- Downloads last month
- 0