Llama-2-7b-dpo-10k
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.7215
- Rewards/real: 5.3782
- Rewards/generated: 4.9113
- Rewards/accuracies: 0.6923
- Rewards/margins: 0.4668
- Logps/generated: -113.1980
- Logps/real: -125.7774
- Logits/generated: -1.1385
- Logits/real: -1.0466
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real |
---|---|---|---|---|---|---|---|---|---|---|---|
0.8559 | 0.1984 | 62 | 0.8605 | 0.4128 | 0.4099 | 0.4808 | 0.0029 | -158.2126 | -175.4314 | -0.8219 | -0.6123 |
0.7999 | 0.3968 | 124 | 0.8323 | 1.5863 | 1.5154 | 0.5192 | 0.0709 | -147.1573 | -163.6966 | -0.8057 | -0.6067 |
0.7846 | 0.5952 | 186 | 0.7979 | 2.4470 | 2.3135 | 0.5577 | 0.1335 | -139.1767 | -155.0893 | -0.8686 | -0.6862 |
0.7916 | 0.7936 | 248 | 0.7819 | 3.0117 | 2.8464 | 0.6346 | 0.1653 | -133.8475 | -149.4422 | -0.9049 | -0.7322 |
0.7714 | 0.992 | 310 | 0.7630 | 3.4214 | 3.1941 | 0.6346 | 0.2273 | -130.3704 | -145.3455 | -0.9511 | -0.7905 |
0.678 | 1.1904 | 372 | 0.7552 | 3.9523 | 3.6931 | 0.6538 | 0.2592 | -125.3802 | -140.0360 | -0.9800 | -0.8279 |
0.6337 | 1.3888 | 434 | 0.7464 | 4.4541 | 4.1602 | 0.6346 | 0.2939 | -120.7093 | -135.0177 | -1.0279 | -0.8860 |
0.6575 | 1.5872 | 496 | 0.7352 | 4.8501 | 4.4918 | 0.6538 | 0.3583 | -117.3935 | -131.0585 | -1.0562 | -0.9285 |
0.6606 | 1.7856 | 558 | 0.7270 | 5.1119 | 4.7485 | 0.6538 | 0.3634 | -114.8267 | -128.4403 | -1.0969 | -0.9780 |
0.6319 | 1.984 | 620 | 0.7260 | 5.2581 | 4.8563 | 0.6538 | 0.4018 | -113.7479 | -126.9782 | -1.0953 | -0.9815 |
0.552 | 2.1824 | 682 | 0.7295 | 5.3469 | 4.9377 | 0.6731 | 0.4092 | -112.9344 | -126.0898 | -1.1133 | -1.0072 |
0.5541 | 2.3808 | 744 | 0.7229 | 5.4093 | 4.9819 | 0.6923 | 0.4274 | -112.4924 | -125.4664 | -1.1322 | -1.0330 |
0.5342 | 2.5792 | 806 | 0.7246 | 5.3967 | 4.9520 | 0.6923 | 0.4447 | -112.7909 | -125.5919 | -1.1353 | -1.0397 |
0.5318 | 2.7776 | 868 | 0.7229 | 5.3656 | 4.9040 | 0.6731 | 0.4615 | -113.2710 | -125.9033 | -1.1367 | -1.0427 |
0.5396 | 2.976 | 930 | 0.7215 | 5.3782 | 4.9113 | 0.6923 | 0.4668 | -113.1980 | -125.7774 | -1.1385 | -1.0466 |
Framework versions
- Transformers 4.43.3
- Pytorch 2.2.2+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 31
Model tree for AmberYifan/Llama-2-7b-dpo-10k
Base model
meta-llama/Llama-2-7b-hf