OpenELM-1_1B-DPO-full-least-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0828
Rewards/chosen: -3.9844
Rewards/rejected: -4.25
Rewards/accuracies: 0.5
Rewards/margins: 0.25
Logps/rejected: -712.0
Logps/chosen: -716.0
Logits/rejected: -7.0
Logits/chosen: -7.875

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.1874	0.1047	100	0.6751	-0.4551	-0.5703	0.5430	0.1152	-346.0	-364.0	-14.4375	-14.5625
0.1191	0.2094	200	0.7004	-0.7305	-0.8242	0.5215	0.0962	-372.0	-392.0	-12.4375	-12.625
0.1411	0.3141	300	0.7418	-0.9688	-1.0547	0.4766	0.0894	-394.0	-416.0	-12.25	-12.625
0.1554	0.4188	400	0.7757	-1.8906	-1.9609	0.4941	0.0688	-484.0	-508.0	-12.0625	-12.4375
0.1334	0.5236	500	0.8440	-1.8906	-1.9297	0.4805	0.0405	-482.0	-508.0	-14.75	-15.0625
0.1429	0.6283	600	0.8180	-1.6406	-1.6875	0.5078	0.0515	-458.0	-482.0	-14.625	-14.875
0.1228	0.7330	700	0.8036	-1.5625	-1.7266	0.5	0.1611	-460.0	-474.0	-14.0625	-14.4375
0.1182	0.8377	800	0.8425	-1.8438	-1.9141	0.5078	0.0664	-480.0	-504.0	-14.875	-15.125
0.1471	0.9424	900	0.8238	-2.4219	-2.5312	0.5039	0.0991	-540.0	-560.0	-13.625	-13.9375
0.0251	1.0471	1000	0.8727	-2.2969	-2.4219	0.4805	0.1289	-532.0	-548.0	-14.0	-14.25
0.033	1.1518	1100	0.8287	-2.0625	-2.1094	0.5078	0.0503	-500.0	-524.0	-13.75	-14.125
0.0204	1.2565	1200	0.8519	-2.3281	-2.4688	0.5312	0.1377	-536.0	-552.0	-11.125	-11.9375
0.0168	1.3613	1300	0.8707	-2.8906	-3.0625	0.5098	0.1748	-596.0	-608.0	-8.9375	-9.8125
0.0183	1.4660	1400	0.9055	-2.625	-2.7344	0.5	0.1172	-564.0	-580.0	-11.5625	-12.125
0.0258	1.5707	1500	0.8797	-2.3906	-2.4531	0.5098	0.0630	-536.0	-556.0	-12.1875	-12.625
0.0168	1.6754	1600	0.9114	-2.9844	-3.125	0.5059	0.1338	-600.0	-616.0	-10.4375	-11.0
0.0313	1.7801	1700	0.9136	-2.6562	-2.7344	0.5020	0.0781	-564.0	-584.0	-8.5625	-9.25
0.0207	1.8848	1800	0.9314	-3.0781	-3.2188	0.5059	0.1289	-612.0	-628.0	-8.6875	-9.5
0.0155	1.9895	1900	0.9222	-3.2344	-3.375	0.5059	0.1416	-628.0	-640.0	-5.75	-6.6562
0.0013	2.0942	2000	0.9954	-3.4844	-3.6719	0.5020	0.1885	-656.0	-668.0	-6.625	-7.5312
0.0018	2.1990	2100	1.0399	-3.6562	-3.875	0.4980	0.2119	-676.0	-684.0	-6.75	-7.5938
0.0012	2.3037	2200	1.0474	-3.8125	-4.0312	0.5	0.2363	-692.0	-700.0	-7.25	-8.0625
0.0012	2.4084	2300	1.0703	-3.9531	-4.1875	0.4922	0.2451	-708.0	-712.0	-7.125	-7.9688
0.0014	2.5131	2400	1.0872	-4.0312	-4.3125	0.4980	0.2598	-720.0	-724.0	-7.125	-8.0
0.0013	2.6178	2500	1.0783	-3.9688	-4.2188	0.5020	0.2490	-712.0	-716.0	-6.9375	-7.8125
0.001	2.7225	2600	1.0849	-4.0	-4.25	0.5020	0.2520	-712.0	-720.0	-6.9688	-7.8438
0.0008	2.8272	2700	1.0824	-3.9844	-4.25	0.4980	0.2520	-712.0	-716.0	-7.0	-7.875
0.0007	2.9319	2800	1.0828	-3.9844	-4.25	0.5	0.25	-712.0	-716.0	-7.0	-7.875

Framework versions

Transformers 4.45.1
Pytorch 2.3.0
Datasets 3.0.1
Tokenizers 0.20.0

CharlesLi
/

OpenELM-1_1B-DPO-full-least-similar

OpenELM-1_1B-DPO-full-least-similar

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results