llama_DPO_model_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.1045
Rewards/chosen: 0.4197
Rewards/rejected: -1.9316
Rewards/accuracies: 1.0
Rewards/margins: 2.3513
Logps/rejected: -204.1257
Logps/chosen: -156.4368
Logits/rejected: -1.0515
Logits/chosen: -0.8584

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6732	0.1	25	0.6518	0.0274	-0.0584	0.8867	0.0858	-185.3935	-160.3602	-1.0521	-0.8541
0.588	0.2	50	0.5616	0.0780	-0.2093	0.9933	0.2873	-186.9026	-159.8541	-1.0523	-0.8550
0.5077	0.3	75	0.4690	0.1360	-0.3896	1.0	0.5256	-188.7056	-159.2737	-1.0525	-0.8564
0.4179	0.4	100	0.3872	0.1873	-0.5861	1.0	0.7734	-190.6710	-158.7608	-1.0532	-0.8563
0.3614	0.5	125	0.3170	0.2381	-0.7895	1.0	1.0276	-192.7043	-158.2528	-1.0533	-0.8568
0.2812	0.6	150	0.2544	0.2856	-1.0121	1.0	1.2977	-194.9309	-157.7783	-1.0527	-0.8569
0.2378	0.7	175	0.2066	0.3262	-1.2240	1.0	1.5502	-197.0494	-157.3717	-1.0520	-0.8573
0.1866	0.79	200	0.1704	0.3591	-1.4222	1.0	1.7812	-199.0312	-157.0431	-1.0526	-0.8577
0.1555	0.89	225	0.1429	0.3829	-1.6050	1.0	1.9879	-200.8594	-156.8051	-1.0523	-0.8580
0.1312	0.99	250	0.1239	0.4002	-1.7534	1.0	2.1536	-202.3439	-156.6322	-1.0515	-0.8572
0.1276	1.09	275	0.1147	0.4086	-1.8325	1.0	2.2410	-203.1341	-156.5480	-1.0518	-0.8578
0.1038	1.19	300	0.1094	0.4144	-1.8779	1.0	2.2923	-203.5883	-156.4901	-1.0511	-0.8574
0.101	1.29	325	0.1072	0.4191	-1.9023	1.0	2.3214	-203.8326	-156.4429	-1.0512	-0.8569
0.1128	1.39	350	0.1056	0.4189	-1.9206	1.0	2.3394	-204.0154	-156.4454	-1.0511	-0.8576
0.11	1.49	375	0.1047	0.4220	-1.9262	1.0	2.3482	-204.0712	-156.4135	-1.0509	-0.8570
0.1001	1.59	400	0.1048	0.4224	-1.9281	1.0	2.3505	-204.0909	-156.4098	-1.0514	-0.8574
0.0978	1.69	425	0.1042	0.4246	-1.9292	1.0	2.3538	-204.1014	-156.3875	-1.0512	-0.8573
0.1111	1.79	450	0.1041	0.4244	-1.9292	1.0	2.3536	-204.1017	-156.3903	-1.0514	-0.8587
0.1064	1.89	475	0.1044	0.4199	-1.9317	1.0	2.3516	-204.1266	-156.4352	-1.0514	-0.8577
0.107	1.99	500	0.1045	0.4197	-1.9316	1.0	2.3513	-204.1257	-156.4368	-1.0515	-0.8584

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2

thorirhrafn
/

llama_DPO_model_e2

llama_DPO_model_e2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Evaluation results

llama_DPO_model_e2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for meta-llama/Llama-2-7b-hf

Evaluation results

Adapter for