Llama-2-7b-hf-DPO-FullEval_LookAhead5_TTree1.2_TT0.7_TP0.7_TE0.1_Filtered0.1_V1.0
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.9993
- Rewards/chosen: -2.7308
- Rewards/rejected: -3.0957
- Rewards/accuracies: 0.6667
- Rewards/margins: 0.3649
- Logps/rejected: -93.3609
- Logps/chosen: -104.7227
- Logits/rejected: -1.6352
- Logits/chosen: -1.6231
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.778 | 0.2994 | 78 | 0.6949 | -0.3298 | -0.3608 | 0.4167 | 0.0310 | -66.0123 | -80.7126 | -0.5458 | -0.5189 |
0.5895 | 0.5988 | 156 | 0.7221 | -0.3528 | -0.4053 | 0.5 | 0.0526 | -66.4578 | -80.9425 | -0.5269 | -0.4997 |
0.6423 | 0.8983 | 234 | 0.8029 | -0.5435 | -0.7369 | 0.5 | 0.1935 | -69.7736 | -82.8494 | -0.5770 | -0.5512 |
0.3985 | 1.1977 | 312 | 0.7639 | -0.4640 | -0.9647 | 0.5833 | 0.5008 | -72.0517 | -82.0546 | -0.7305 | -0.7083 |
0.4527 | 1.4971 | 390 | 0.8308 | -0.3638 | -0.5763 | 0.5 | 0.2125 | -68.1671 | -81.0529 | -0.9151 | -0.8936 |
0.3677 | 1.7965 | 468 | 0.7432 | -0.8212 | -1.2349 | 0.6667 | 0.4137 | -74.7538 | -85.6273 | -1.0459 | -1.0262 |
0.2591 | 2.0960 | 546 | 0.8634 | -1.6345 | -2.0043 | 0.5833 | 0.3699 | -82.4478 | -93.7598 | -1.2888 | -1.2721 |
0.1802 | 2.3954 | 624 | 1.1197 | -3.1423 | -3.3841 | 0.6667 | 0.2418 | -96.2452 | -108.8380 | -1.6577 | -1.6465 |
0.2054 | 2.6948 | 702 | 1.0008 | -2.7513 | -3.1213 | 0.6667 | 0.3700 | -93.6174 | -104.9277 | -1.6348 | -1.6227 |
0.2823 | 2.9942 | 780 | 0.9993 | -2.7308 | -3.0957 | 0.6667 | 0.3649 | -93.3609 | -104.7227 | -1.6352 | -1.6231 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 46
Model tree for LBK95/Llama-2-7b-hf-DPO-FullEval_LookAhead5_TTree1.2_TT0.7_TP0.7_TE0.1_Filtered0.1_V1.0
Base model
meta-llama/Llama-2-7b-hf
Adapter
this model