llama_DPO_model_e3

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0722
Rewards/chosen: 0.4618
Rewards/rejected: -2.3246
Rewards/accuracies: 1.0
Rewards/margins: 2.7864
Logps/rejected: -208.0558
Logps/chosen: -156.0157
Logits/rejected: -1.0512
Logits/chosen: -0.8590

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.675	0.1	25	0.6531	0.0248	-0.0584	0.8667	0.0832	-185.3936	-160.3859	-1.0523	-0.8549
0.5865	0.2	50	0.5720	0.0730	-0.1895	0.9933	0.2625	-186.7048	-159.9039	-1.0525	-0.8552
0.5203	0.3	75	0.4808	0.1258	-0.3673	1.0	0.4931	-188.4825	-159.3763	-1.0520	-0.8543
0.4291	0.4	100	0.3986	0.1804	-0.5547	1.0	0.7352	-190.3568	-158.8295	-1.0527	-0.8559
0.3712	0.5	125	0.3264	0.2303	-0.7594	1.0	0.9897	-192.4033	-158.3308	-1.0528	-0.8572
0.2856	0.6	150	0.2612	0.2765	-0.9893	1.0	1.2658	-194.7025	-157.8685	-1.0531	-0.8592
0.2433	0.7	175	0.2086	0.3223	-1.2201	1.0	1.5424	-197.0102	-157.4110	-1.0526	-0.8573
0.1822	0.79	200	0.1673	0.3627	-1.4385	1.0	1.8012	-199.1950	-157.0071	-1.0529	-0.8606
0.1511	0.89	225	0.1354	0.3921	-1.6585	1.0	2.0506	-201.3948	-156.7133	-1.0522	-0.8601
0.1211	0.99	250	0.1134	0.4119	-1.8492	1.0	2.2612	-203.3017	-156.5144	-1.0526	-0.8591
0.113	1.09	275	0.0999	0.4261	-1.9792	1.0	2.4054	-204.6017	-156.3724	-1.0511	-0.8578
0.087	1.19	300	0.0912	0.4374	-2.0704	1.0	2.5078	-205.5134	-156.2602	-1.0521	-0.8612
0.0808	1.29	325	0.0846	0.4439	-2.1510	1.0	2.5949	-206.3199	-156.1949	-1.0515	-0.8600
0.0875	1.39	350	0.0814	0.4537	-2.1942	1.0	2.6479	-206.7517	-156.0968	-1.0520	-0.8589
0.0826	1.49	375	0.0785	0.4559	-2.2325	1.0	2.6884	-207.1346	-156.0752	-1.0516	-0.8585
0.0717	1.59	400	0.0768	0.4564	-2.2611	1.0	2.7175	-207.4205	-156.0697	-1.0517	-0.8595
0.0694	1.69	425	0.0750	0.4602	-2.2778	1.0	2.7380	-207.5878	-156.0322	-1.0516	-0.8590
0.0809	1.79	450	0.0739	0.4647	-2.2925	1.0	2.7572	-207.7341	-155.9865	-1.0514	-0.8586
0.0747	1.89	475	0.0736	0.4595	-2.3075	1.0	2.7670	-207.8848	-156.0394	-1.0515	-0.8584
0.0751	1.99	500	0.0726	0.4643	-2.3130	1.0	2.7773	-207.9396	-155.9911	-1.0516	-0.8589
0.069	2.09	525	0.0725	0.4608	-2.3223	1.0	2.7831	-208.0324	-156.0257	-1.0512	-0.8598
0.0658	2.19	550	0.0724	0.4670	-2.3178	1.0	2.7847	-207.9872	-155.9642	-1.0514	-0.8580
0.0659	2.29	575	0.0720	0.4650	-2.3217	1.0	2.7867	-208.0269	-155.9841	-1.0516	-0.8592
0.0732	2.38	600	0.0725	0.4585	-2.3236	1.0	2.7821	-208.0455	-156.0485	-1.0511	-0.8591
0.0802	2.48	625	0.0723	0.4611	-2.3249	1.0	2.7859	-208.0582	-156.0233	-1.0511	-0.8582
0.0734	2.58	650	0.0723	0.4646	-2.3213	1.0	2.7859	-208.0227	-155.9879	-1.0510	-0.8591
0.068	2.68	675	0.0723	0.4627	-2.3230	1.0	2.7857	-208.0397	-156.0069	-1.0512	-0.8585
0.0708	2.78	700	0.0720	0.4617	-2.3278	1.0	2.7895	-208.0874	-156.0165	-1.0508	-0.8592
0.0621	2.88	725	0.0719	0.4613	-2.3296	1.0	2.7909	-208.1059	-156.0208	-1.0511	-0.8585
0.0708	2.98	750	0.0722	0.4618	-2.3246	1.0	2.7864	-208.0558	-156.0157	-1.0512	-0.8590

Framework versions

PEFT 0.8.2
Transformers 4.38.1
Pytorch 2.2.0+cu118
Datasets 2.17.1
Tokenizers 0.15.2

thorirhrafn
/

llama_DPO_model_e3

llama_DPO_model_e3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Evaluation results

llama_DPO_model_e3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for meta-llama/Llama-2-7b-hf

Evaluation results

Adapter for