metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0739
Rewards/chosen: 0.4632
Rewards/rejected: -2.2899
Rewards/accuracies: 1.0
Rewards/margins: 2.7530
Logps/rejected: -207.7081
Logps/chosen: -156.0022
Logits/rejected: -1.0521
Logits/chosen: -0.8598

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 9e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6694	0.1	25	0.6365	0.0370	-0.0813	0.9433	0.1183	-185.6225	-160.2637	-1.0521	-0.8545
0.5526	0.2	50	0.5246	0.1015	-0.2765	0.9967	0.3780	-187.5744	-159.6185	-1.0524	-0.8560
0.4607	0.3	75	0.4173	0.1669	-0.5106	1.0	0.6775	-189.9152	-158.9647	-1.0530	-0.8562
0.3595	0.4	100	0.3251	0.2304	-0.7635	1.0	0.9940	-192.4449	-158.3297	-1.0530	-0.8567
0.297	0.5	125	0.2521	0.2883	-1.0189	1.0	1.3072	-194.9990	-157.7509	-1.0526	-0.8573
0.2217	0.6	150	0.1968	0.3313	-1.2778	1.0	1.6090	-197.5871	-157.3212	-1.0525	-0.8576
0.1832	0.7	175	0.1539	0.3750	-1.5241	1.0	1.8991	-200.0504	-156.8834	-1.0531	-0.8606
0.1374	0.79	200	0.1238	0.4055	-1.7491	1.0	2.1546	-202.3004	-156.5787	-1.0525	-0.8614
0.116	0.89	225	0.1027	0.4306	-1.9426	1.0	2.3732	-204.2353	-156.3275	-1.0526	-0.8606
0.095	0.99	250	0.0898	0.4405	-2.0888	1.0	2.5293	-205.6978	-156.2289	-1.0523	-0.8603
0.0921	1.09	275	0.0831	0.4465	-2.1733	1.0	2.6198	-206.5422	-156.1685	-1.0524	-0.8593
0.0734	1.19	300	0.0793	0.4520	-2.2224	1.0	2.6744	-207.0332	-156.1135	-1.0519	-0.8627
0.0711	1.29	325	0.0766	0.4558	-2.2584	1.0	2.7142	-207.3936	-156.0763	-1.0520	-0.8592
0.0806	1.39	350	0.0754	0.4630	-2.2725	1.0	2.7355	-207.5350	-156.0041	-1.0520	-0.8599
0.079	1.49	375	0.0748	0.4622	-2.2779	1.0	2.7401	-207.5887	-156.0115	-1.0522	-0.8602
0.0711	1.59	400	0.0746	0.4615	-2.2817	1.0	2.7432	-207.6269	-156.0192	-1.0519	-0.8603
0.0689	1.69	425	0.0744	0.4624	-2.2862	1.0	2.7486	-207.6718	-156.0103	-1.0522	-0.8594
0.0809	1.79	450	0.0742	0.4631	-2.2887	1.0	2.7518	-207.6965	-156.0032	-1.0517	-0.8610
0.0759	1.89	475	0.0740	0.4629	-2.2902	1.0	2.7531	-207.7117	-156.0047	-1.0517	-0.8594
0.0758	1.99	500	0.0739	0.4632	-2.2899	1.0	2.7530	-207.7081	-156.0022	-1.0521	-0.8598

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2