metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.1205
Rewards/chosen: 0.4005
Rewards/rejected: -1.7841
Rewards/accuracies: 1.0
Rewards/margins: 2.1847
Logps/rejected: -202.6509
Logps/chosen: -156.6288
Logits/rejected: -1.0515
Logits/chosen: -0.8581

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6753	0.1	25	0.6561	0.0241	-0.0529	0.8800	0.0770	-185.3385	-160.3932	-1.0518	-0.8547
0.596	0.2	50	0.5763	0.0663	-0.1863	0.9933	0.2525	-186.6722	-159.9714	-1.0527	-0.8563
0.5265	0.3	75	0.4888	0.1230	-0.3480	1.0	0.4710	-188.2895	-159.4043	-1.0529	-0.8557
0.4405	0.4	100	0.4115	0.1711	-0.5248	1.0	0.6959	-190.0574	-158.9227	-1.0521	-0.8557
0.3832	0.5	125	0.3418	0.2187	-0.7108	1.0	0.9295	-191.9176	-158.4473	-1.0530	-0.8571
0.3071	0.6	150	0.2809	0.2614	-0.9143	1.0	1.1757	-193.9524	-158.0195	-1.0526	-0.8568
0.2635	0.7	175	0.2300	0.3051	-1.1158	1.0	1.4209	-195.9679	-157.5830	-1.0531	-0.8575
0.2056	0.79	200	0.1912	0.3381	-1.3041	1.0	1.6422	-197.8509	-157.2532	-1.0529	-0.8577
0.1735	0.89	225	0.1617	0.3637	-1.4760	1.0	1.8397	-199.5699	-156.9968	-1.0524	-0.8580
0.1492	0.99	250	0.1416	0.3797	-1.6179	1.0	1.9976	-200.9889	-156.8374	-1.0521	-0.8575
0.144	1.09	275	0.1304	0.3918	-1.6997	1.0	2.0915	-201.8062	-156.7157	-1.0517	-0.8590
0.1203	1.19	300	0.1255	0.3955	-1.7398	1.0	2.1353	-202.2080	-156.6790	-1.0514	-0.8580
0.117	1.29	325	0.1229	0.3961	-1.7635	1.0	2.1596	-202.4451	-156.6730	-1.0514	-0.8572
0.1286	1.39	350	0.1209	0.4018	-1.7766	1.0	2.1784	-202.5752	-156.6156	-1.0517	-0.8587
0.126	1.49	375	0.1199	0.4025	-1.7866	1.0	2.1891	-202.6759	-156.6091	-1.0517	-0.8587
0.1154	1.59	400	0.1202	0.4013	-1.7865	1.0	2.1877	-202.6743	-156.6213	-1.0514	-0.8580
0.1141	1.69	425	0.1200	0.3990	-1.7907	1.0	2.1897	-202.7168	-156.6437	-1.0518	-0.8578
0.1284	1.79	450	0.1196	0.4012	-1.7899	1.0	2.1910	-202.7081	-156.6221	-1.0518	-0.8582
0.1225	1.89	475	0.1205	0.3984	-1.7858	1.0	2.1842	-202.6674	-156.6495	-1.0517	-0.8592
0.1224	1.99	500	0.1205	0.4005	-1.7841	1.0	2.1847	-202.6509	-156.6288	-1.0515	-0.8581

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2