metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0037
Rewards/chosen: 0.5612
Rewards/rejected: -5.9460
Rewards/accuracies: 1.0
Rewards/margins: 6.5073
Logps/rejected: -244.2698
Logps/chosen: -155.0214
Logits/rejected: -1.0632
Logits/chosen: -0.8795

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.3693	0.1	25	0.1586	0.3906	-1.4782	1.0	1.8688	-199.5915	-156.7276	-1.0532	-0.8639
0.0442	0.2	50	0.0275	0.5577	-3.3969	1.0	3.9546	-218.7789	-155.0573	-1.0591	-0.8709
0.0153	0.3	75	0.0123	0.5805	-4.3685	1.0	4.9490	-228.4945	-154.8291	-1.0641	-0.8765
0.0098	0.4	100	0.0083	0.5880	-4.8560	1.0	5.4440	-233.3696	-154.7535	-1.0654	-0.8801
0.0072	0.5	125	0.0065	0.5779	-5.1733	1.0	5.7513	-236.5429	-154.8546	-1.0667	-0.8808
0.0056	0.6	150	0.0058	0.5669	-5.3483	1.0	5.9152	-238.2926	-154.9651	-1.0674	-0.8815
0.0059	0.7	175	0.0051	0.5733	-5.4970	1.0	6.0704	-239.7797	-154.9004	-1.0659	-0.8820
0.0065	0.79	200	0.0047	0.5713	-5.6304	1.0	6.2017	-241.1136	-154.9210	-1.0653	-0.8803
0.0044	0.89	225	0.0043	0.5689	-5.7514	1.0	6.3203	-242.3240	-154.9452	-1.0650	-0.8816
0.004	0.99	250	0.0041	0.5671	-5.8118	1.0	6.3790	-242.9280	-154.9625	-1.0644	-0.8796
0.0029	1.09	275	0.0040	0.5648	-5.8589	1.0	6.4237	-243.3990	-154.9863	-1.0633	-0.8800
0.0035	1.19	300	0.0038	0.5658	-5.8892	1.0	6.4549	-243.7013	-154.9761	-1.0630	-0.8785
0.0024	1.29	325	0.0039	0.5618	-5.9044	1.0	6.4662	-243.8535	-155.0163	-1.0628	-0.8787
0.0034	1.39	350	0.0038	0.5595	-5.9136	1.0	6.4731	-243.9456	-155.0389	-1.0632	-0.8788
0.0029	1.49	375	0.0038	0.5601	-5.9328	1.0	6.4929	-244.1375	-155.0332	-1.0634	-0.8792
0.003	1.59	400	0.0038	0.5605	-5.9352	1.0	6.4957	-244.1614	-155.0284	-1.0632	-0.8793
0.0021	1.69	425	0.0038	0.5593	-5.9410	1.0	6.5003	-244.2199	-155.0412	-1.0630	-0.8792
0.0036	1.79	450	0.0038	0.5605	-5.9408	1.0	6.5013	-244.2178	-155.0292	-1.0631	-0.8794
0.0031	1.89	475	0.0038	0.5567	-5.9439	1.0	6.5006	-244.2483	-155.0666	-1.0634	-0.8782
0.0032	1.99	500	0.0037	0.5612	-5.9460	1.0	6.5073	-244.2698	-155.0214	-1.0632	-0.8795

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2