metadata

license: llama2
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
  - name: llama_DPO_model_e2
    results: []

llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0587
Rewards/chosen: 0.4885
Rewards/rejected: -2.5446
Rewards/accuracies: 1.0
Rewards/margins: 3.0331
Logps/rejected: -210.2559
Logps/chosen: -155.7489
Logits/rejected: -1.0525
Logits/chosen: -0.8603

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6664	0.1	25	0.6240	0.0413	-0.1038	0.9633	0.1451	-185.8477	-160.2207	-1.0521	-0.8552
0.5275	0.2	50	0.4961	0.1194	-0.3323	1.0	0.4517	-188.1325	-159.4397	-1.0520	-0.8543
0.4242	0.3	75	0.3772	0.1960	-0.6107	1.0	0.8067	-190.9165	-158.6736	-1.0530	-0.8585
0.3194	0.4	100	0.2809	0.2609	-0.9146	1.0	1.1755	-193.9560	-158.0250	-1.0526	-0.8576
0.2569	0.5	125	0.2098	0.3243	-1.2033	1.0	1.5276	-196.8424	-157.3911	-1.0523	-0.8568
0.1815	0.6	150	0.1591	0.3689	-1.4935	1.0	1.8624	-199.7451	-156.9453	-1.0527	-0.8590
0.1488	0.7	175	0.1233	0.4109	-1.7538	1.0	2.1647	-202.3471	-156.5246	-1.0528	-0.8590
0.1097	0.79	200	0.0966	0.4448	-2.0010	1.0	2.4458	-204.8196	-156.1859	-1.0531	-0.8595
0.0925	0.89	225	0.0804	0.4615	-2.1974	1.0	2.6589	-206.7837	-156.0186	-1.0534	-0.8616
0.0748	0.99	250	0.0707	0.4708	-2.3440	1.0	2.8148	-208.2495	-155.9261	-1.0526	-0.8606
0.0717	1.09	275	0.0649	0.4788	-2.4354	1.0	2.9142	-209.1637	-155.8455	-1.0523	-0.8600
0.057	1.19	300	0.0616	0.4820	-2.4896	1.0	2.9716	-209.7052	-155.8138	-1.0532	-0.8609
0.0543	1.29	325	0.0598	0.4864	-2.5199	1.0	3.0064	-210.0089	-155.7695	-1.0522	-0.8598
0.0634	1.39	350	0.0591	0.4873	-2.5345	1.0	3.0218	-210.1548	-155.7612	-1.0529	-0.8603
0.0614	1.49	375	0.0584	0.4896	-2.5466	1.0	3.0362	-210.2760	-155.7379	-1.0528	-0.8597
0.0543	1.59	400	0.0580	0.4918	-2.5464	1.0	3.0382	-210.2738	-155.7159	-1.0528	-0.8597
0.0532	1.69	425	0.0579	0.4902	-2.5495	1.0	3.0397	-210.3050	-155.7321	-1.0520	-0.8605
0.0632	1.79	450	0.0577	0.4907	-2.5514	1.0	3.0422	-210.3238	-155.7266	-1.0522	-0.8601
0.0596	1.89	475	0.0579	0.4923	-2.5509	1.0	3.0432	-210.3188	-155.7112	-1.0527	-0.8614
0.0597	1.99	500	0.0587	0.4885	-2.5446	1.0	3.0331	-210.2559	-155.7489	-1.0525	-0.8603

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2