Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V3.0
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.5631
- Rewards/chosen: -2.3247
- Rewards/rejected: -3.0071
- Rewards/accuracies: 0.625
- Rewards/margins: 0.6824
- Logps/rejected: -155.5472
- Logps/chosen: -109.3707
- Logits/rejected: -1.3016
- Logits/chosen: -1.3197
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6474 | 0.3018 | 51 | 0.6491 | -0.0362 | -0.1070 | 0.5 | 0.0707 | -126.5454 | -86.4856 | -0.6602 | -0.6833 |
0.6614 | 0.6036 | 102 | 0.5967 | -0.0764 | -0.2716 | 0.625 | 0.1951 | -128.1913 | -86.8877 | -0.6723 | -0.6955 |
0.736 | 0.9053 | 153 | 0.6105 | -0.3083 | -0.5178 | 0.625 | 0.2095 | -130.6541 | -89.2063 | -0.7358 | -0.7574 |
0.4273 | 1.2071 | 204 | 0.5950 | -0.5205 | -0.8235 | 0.75 | 0.3030 | -133.7103 | -91.3283 | -0.8108 | -0.8319 |
0.4513 | 1.5089 | 255 | 0.5775 | -0.8673 | -1.1891 | 0.5 | 0.3218 | -137.3667 | -94.7965 | -0.8911 | -0.9112 |
0.376 | 1.8107 | 306 | 0.5885 | -0.9856 | -1.2703 | 0.375 | 0.2848 | -138.1790 | -95.9789 | -0.8967 | -0.9161 |
0.3154 | 2.1124 | 357 | 0.5543 | -1.3571 | -1.8062 | 0.625 | 0.4491 | -143.5375 | -99.6945 | -1.0781 | -1.0970 |
0.0512 | 2.4142 | 408 | 0.5432 | -1.8765 | -2.4774 | 0.75 | 0.6009 | -150.2498 | -104.8879 | -1.2016 | -1.2198 |
0.0875 | 2.7160 | 459 | 0.5631 | -2.3247 | -3.0071 | 0.625 | 0.6824 | -155.5472 | -109.3707 | -1.3016 | -1.3197 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1
- Downloads last month
- 9
Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V3.0
Base model
meta-llama/Llama-2-7b-hf