Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V1
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.9544
- Rewards/chosen: -4.7879
- Rewards/rejected: -3.3303
- Rewards/accuracies: 0.4167
- Rewards/margins: -1.4576
- Logps/rejected: -139.8422
- Logps/chosen: -221.9621
- Logits/rejected: -0.3063
- Logits/chosen: -0.3612
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.666 | 0.3008 | 80 | 0.7114 | -0.0994 | -0.0692 | 0.5 | -0.0302 | -107.2313 | -175.0767 | 0.3992 | 0.3563 |
0.7848 | 0.6015 | 160 | 0.7871 | -0.5530 | -0.4147 | 0.4167 | -0.1383 | -110.6864 | -179.6128 | 0.4061 | 0.3637 |
0.7413 | 0.9023 | 240 | 0.8345 | -0.6343 | -0.4162 | 0.4167 | -0.2181 | -110.7009 | -180.4258 | 0.3814 | 0.3393 |
0.5906 | 1.2030 | 320 | 1.0830 | -1.4953 | -0.8871 | 0.4167 | -0.6082 | -115.4103 | -189.0355 | 0.2654 | 0.2219 |
0.3771 | 1.5038 | 400 | 1.1984 | -2.0768 | -1.2714 | 0.3333 | -0.8053 | -119.2534 | -194.8505 | 0.1371 | 0.0921 |
0.2132 | 1.8045 | 480 | 1.2438 | -2.4881 | -1.6813 | 0.3333 | -0.8068 | -123.3516 | -198.9633 | 0.0444 | -0.0027 |
0.0544 | 2.1053 | 560 | 1.6818 | -3.7464 | -2.5485 | 0.1667 | -1.1979 | -132.0241 | -211.5465 | -0.1111 | -0.1621 |
0.0452 | 2.4060 | 640 | 1.8619 | -4.4511 | -3.1120 | 0.4167 | -1.3391 | -137.6592 | -218.5939 | -0.2407 | -0.2942 |
0.023 | 2.7068 | 720 | 1.9544 | -4.7879 | -3.3303 | 0.4167 | -1.4576 | -139.8422 | -221.9621 | -0.3063 | -0.3612 |
Framework versions
- PEFT 0.13.2
- Transformers 4.45.2
- Pytorch 2.4.0+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 10
Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V1
Base model
meta-llama/Llama-2-7b-hf