metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0572
Rewards/chosen: 0.4916
Rewards/rejected: -2.5677
Rewards/accuracies: 1.0
Rewards/margins: 3.0592
Logps/rejected: -210.4865
Logps/chosen: -155.7183
Logits/rejected: -1.0527
Logits/chosen: -0.8611

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6588	0.1	25	0.6197	0.0430	-0.1117	0.9633	0.1547	-185.9265	-160.2034	-1.0522	-0.8546
0.5198	0.2	50	0.4923	0.1198	-0.3424	0.9933	0.4622	-188.2335	-159.4357	-1.0525	-0.8554
0.422	0.3	75	0.3707	0.2016	-0.6277	1.0	0.8293	-191.0862	-158.6175	-1.0532	-0.8571
0.3133	0.4	100	0.2775	0.2622	-0.9287	1.0	1.1908	-194.0961	-158.0122	-1.0529	-0.8575
0.2536	0.5	125	0.2077	0.3244	-1.2160	1.0	1.5403	-196.9694	-157.3904	-1.0527	-0.8608
0.181	0.6	150	0.1559	0.3746	-1.5115	1.0	1.8860	-199.9242	-156.8883	-1.0534	-0.8595
0.1457	0.7	175	0.1203	0.4136	-1.7795	1.0	2.1931	-202.6049	-156.4983	-1.0534	-0.8620
0.1072	0.79	200	0.0950	0.4439	-2.0245	1.0	2.4684	-205.0550	-156.1949	-1.0532	-0.8613
0.0921	0.89	225	0.0792	0.4625	-2.2196	1.0	2.6821	-207.0056	-156.0085	-1.0535	-0.8604
0.0732	0.99	250	0.0694	0.4721	-2.3665	1.0	2.8387	-208.4748	-155.9124	-1.0530	-0.8609
0.0703	1.09	275	0.0636	0.4762	-2.4589	1.0	2.9351	-209.3987	-155.8720	-1.0527	-0.8600
0.0554	1.19	300	0.0606	0.4841	-2.5053	1.0	2.9894	-209.8628	-155.7928	-1.0528	-0.8614
0.0532	1.29	325	0.0592	0.4869	-2.5331	1.0	3.0200	-210.1407	-155.7649	-1.0527	-0.8606
0.061	1.39	350	0.0580	0.4912	-2.5550	1.0	3.0462	-210.3595	-155.7218	-1.0525	-0.8611
0.0612	1.49	375	0.0573	0.4930	-2.5633	1.0	3.0563	-210.4424	-155.7034	-1.0527	-0.8613
0.0539	1.59	400	0.0576	0.4921	-2.5602	1.0	3.0523	-210.4118	-155.7133	-1.0529	-0.8596
0.0517	1.69	425	0.0570	0.4917	-2.5691	1.0	3.0608	-210.5005	-155.7172	-1.0529	-0.8602
0.0627	1.79	450	0.0570	0.4938	-2.5669	1.0	3.0607	-210.4783	-155.6961	-1.0532	-0.8608
0.0575	1.89	475	0.0574	0.4911	-2.5664	1.0	3.0574	-210.4731	-155.7233	-1.0528	-0.8612
0.0578	1.99	500	0.0572	0.4916	-2.5677	1.0	3.0592	-210.4865	-155.7183	-1.0527	-0.8611

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2