Llama-2-7b-hf-DPO-PartialEval_LookAhead5_ET0.1_MT1.2_1-5_Filtered0.1_V2.0
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.0109
- Rewards/chosen: -2.4672
- Rewards/rejected: -2.3527
- Rewards/accuracies: 0.5
- Rewards/margins: -0.1144
- Logps/rejected: -101.7496
- Logps/chosen: -111.6802
- Logits/rejected: -1.3387
- Logits/chosen: -1.3182
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6944 | 0.3026 | 77 | 0.7061 | -0.0481 | -0.0149 | 0.3333 | -0.0332 | -78.3709 | -87.4894 | -0.4703 | -0.4454 |
0.7162 | 0.6051 | 154 | 0.7499 | -0.0798 | 0.0370 | 0.25 | -0.1168 | -77.8523 | -87.8063 | -0.5650 | -0.5389 |
0.7265 | 0.9077 | 231 | 0.7156 | -0.1175 | -0.0976 | 0.5833 | -0.0199 | -79.1982 | -88.1833 | -0.5730 | -0.5467 |
0.6934 | 1.2102 | 308 | 0.8015 | -0.6012 | -0.5127 | 0.5 | -0.0884 | -83.3497 | -93.0202 | -0.7454 | -0.7223 |
0.4346 | 1.5128 | 385 | 0.8278 | -0.8704 | -0.8319 | 0.5 | -0.0385 | -86.5415 | -95.7126 | -0.9246 | -0.9032 |
0.6773 | 1.8153 | 462 | 0.7807 | -0.8712 | -0.8972 | 0.5 | 0.0260 | -87.1945 | -95.7207 | -0.8954 | -0.8734 |
0.3446 | 2.1179 | 539 | 0.8424 | -1.5623 | -1.5383 | 0.5833 | -0.0240 | -93.6050 | -102.6317 | -1.0923 | -1.0707 |
0.1483 | 2.4204 | 616 | 0.9759 | -2.2720 | -2.1419 | 0.5833 | -0.1301 | -99.6412 | -109.7289 | -1.2875 | -1.2666 |
0.213 | 2.7230 | 693 | 1.0109 | -2.4672 | -2.3527 | 0.5 | -0.1144 | -101.7496 | -111.6802 | -1.3387 | -1.3182 |
Framework versions
- PEFT 0.12.0
- Transformers 4.43.3
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Model tree for LBK95/Llama-2-7b-hf-DPO-PartialEval_LookAhead5_ET0.1_MT1.2_1-5_Filtered0.1_V2.0
Base model
meta-llama/Llama-2-7b-hf
Adapter
this model