metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0896
Rewards/chosen: 0.4401
Rewards/rejected: -2.0930
Rewards/accuracies: 1.0
Rewards/margins: 2.5330
Logps/rejected: -205.7391
Logps/chosen: -156.2334
Logits/rejected: -1.0514
Logits/chosen: -0.8587

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6699	0.1	25	0.6428	0.0307	-0.0744	0.9033	0.1051	-185.5532	-160.3267	-1.0520	-0.8550
0.5702	0.2	50	0.5471	0.0866	-0.2359	0.9933	0.3225	-187.1690	-159.7680	-1.0514	-0.8544
0.488	0.3	75	0.4456	0.1502	-0.4424	1.0	0.5926	-189.2334	-159.1314	-1.0527	-0.8555
0.3957	0.4	100	0.3600	0.2054	-0.6615	1.0	0.8669	-191.4245	-158.5795	-1.0530	-0.8577
0.3338	0.5	125	0.2865	0.2569	-0.8933	1.0	1.1502	-193.7425	-158.0646	-1.0524	-0.8564
0.253	0.6	150	0.2257	0.3043	-1.1373	1.0	1.4416	-196.1830	-157.5914	-1.0523	-0.8570
0.2134	0.7	175	0.1819	0.3496	-1.3537	1.0	1.7033	-198.3466	-157.1379	-1.0530	-0.8584
0.1613	0.79	200	0.1473	0.3842	-1.5693	1.0	1.9535	-200.5027	-156.7917	-1.0525	-0.8591
0.1358	0.89	225	0.1231	0.4031	-1.7582	1.0	2.1614	-202.3919	-156.6024	-1.0523	-0.8593
0.115	0.99	250	0.1076	0.4205	-1.8980	1.0	2.3185	-203.7897	-156.4292	-1.0521	-0.8590
0.1111	1.09	275	0.0989	0.4291	-1.9856	1.0	2.4148	-204.6660	-156.3426	-1.0515	-0.8591
0.0902	1.19	300	0.0949	0.4280	-2.0337	1.0	2.4617	-205.1465	-156.3540	-1.0507	-0.8576
0.0867	1.29	325	0.0920	0.4325	-2.0705	1.0	2.5030	-205.5146	-156.3087	-1.0510	-0.8576
0.0973	1.39	350	0.0905	0.4357	-2.0839	1.0	2.5196	-205.6485	-156.2766	-1.0506	-0.8576
0.0942	1.49	375	0.0897	0.4422	-2.0838	1.0	2.5260	-205.6476	-156.2122	-1.0515	-0.8578
0.0858	1.59	400	0.0897	0.4392	-2.0903	1.0	2.5295	-205.7121	-156.2415	-1.0515	-0.8587
0.083	1.69	425	0.0893	0.4401	-2.0972	1.0	2.5373	-205.7811	-156.2327	-1.0511	-0.8584
0.0964	1.79	450	0.0897	0.4368	-2.0947	1.0	2.5315	-205.7564	-156.2662	-1.0511	-0.8577
0.0931	1.89	475	0.0890	0.4406	-2.0970	1.0	2.5376	-205.7794	-156.2282	-1.0512	-0.8585
0.0915	1.99	500	0.0896	0.4401	-2.0930	1.0	2.5330	-205.7391	-156.2334	-1.0514	-0.8587

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2