Llama-2-7b-hf-DPO-Filtered-0.2-version-3
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.8860
- Rewards/chosen: -2.2683
- Rewards/rejected: -2.6831
- Rewards/accuracies: 0.5500
- Rewards/margins: 0.4148
- Logps/rejected: -83.1208
- Logps/chosen: -77.0439
- Logits/rejected: -1.3980
- Logits/chosen: -1.4004
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6488 | 0.2994 | 72 | 0.6678 | -0.0112 | -0.0724 | 0.6000 | 0.0613 | -57.0147 | -54.4726 | -0.5812 | -0.5741 |
0.6 | 0.5988 | 144 | 0.6316 | -0.4870 | -0.6800 | 0.6000 | 0.1930 | -63.0904 | -59.2310 | -0.6746 | -0.6711 |
0.5876 | 0.8981 | 216 | 0.6931 | -0.4396 | -0.5539 | 0.5 | 0.1143 | -61.8289 | -58.7568 | -0.5937 | -0.5907 |
0.4949 | 1.1975 | 288 | 0.7890 | -0.7079 | -0.9614 | 0.6500 | 0.2535 | -65.9037 | -61.4400 | -0.8747 | -0.8740 |
0.565 | 1.4969 | 360 | 0.9088 | -1.6793 | -1.8869 | 0.5500 | 0.2077 | -75.1596 | -71.1538 | -1.2245 | -1.2255 |
0.283 | 1.7963 | 432 | 0.8288 | -1.8095 | -2.1999 | 0.6000 | 0.3905 | -78.2897 | -72.4555 | -1.2749 | -1.2766 |
0.1794 | 2.0956 | 504 | 0.8811 | -1.8931 | -2.2411 | 0.5500 | 0.3480 | -78.7009 | -73.2920 | -1.3148 | -1.3161 |
0.3907 | 2.3950 | 576 | 0.8772 | -2.2014 | -2.6107 | 0.5500 | 0.4093 | -82.3973 | -76.375 | -1.4219 | -1.4232 |
0.0225 | 2.6944 | 648 | 0.8843 | -2.2655 | -2.6784 | 0.5500 | 0.4129 | -83.0741 | -77.0161 | -1.3961 | -1.3981 |
0.2077 | 2.9938 | 720 | 0.8860 | -2.2683 | -2.6831 | 0.5500 | 0.4148 | -83.1208 | -77.0439 | -1.3980 | -1.4004 |
Framework versions
- PEFT 0.10.0
- Transformers 4.40.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 2
Unable to determine this model’s pipeline type. Check the
docs
.