OpenELM-1_1B-DPO-new-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8334
Rewards/chosen: -9.6875
Rewards/rejected: -11.875
Rewards/accuracies: 0.7188
Rewards/margins: 2.2031
Logps/rejected: -1472.0
Logps/chosen: -1280.0
Logits/rejected: -1.8125
Logits/chosen: -3.4688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6101	0.1047	100	0.6189	-0.8633	-1.1562	0.6660	0.2930	-402.0	-400.0	-8.1875	-8.3125
0.6173	0.2093	200	0.6125	-1.2812	-1.6484	0.6660	0.3711	-452.0	-442.0	-12.9375	-12.75
0.6488	0.3140	300	0.6205	-2.0781	-2.5625	0.6484	0.4824	-544.0	-524.0	-12.6875	-12.75
0.5965	0.4186	400	0.6005	-2.3594	-3.0156	0.7148	0.6641	-588.0	-548.0	-11.4375	-11.4375
0.5706	0.5233	500	0.5759	-2.0312	-2.5781	0.6875	0.5391	-544.0	-520.0	-10.75	-11.125
0.5348	0.6279	600	0.5724	-3.3125	-4.0938	0.7109	0.7695	-696.0	-644.0	-7.7812	-8.5625
0.5711	0.7326	700	0.5678	-3.2656	-4.0625	0.7168	0.8047	-692.0	-640.0	-7.25	-8.125
0.5505	0.8373	800	0.5683	-2.7031	-3.4062	0.7070	0.7031	-628.0	-584.0	-8.875	-9.625
0.5827	0.9419	900	0.5685	-2.7812	-3.4688	0.7188	0.6797	-632.0	-592.0	-9.0	-10.0
0.2372	1.0466	1000	0.5875	-3.6719	-4.7188	0.7324	1.0312	-756.0	-684.0	-7.4375	-8.9375
0.1955	1.1512	1100	0.5973	-4.4375	-5.5625	0.7285	1.1094	-844.0	-760.0	-6.875	-8.4375
0.1976	1.2559	1200	0.6059	-5.125	-6.25	0.7324	1.1328	-912.0	-828.0	-4.6562	-6.1875
0.1999	1.3605	1300	0.6134	-5.7812	-6.8438	0.7109	1.0781	-972.0	-892.0	-4.875	-6.3125
0.1733	1.4652	1400	0.6179	-6.4375	-7.5625	0.6992	1.125	-1040.0	-956.0	-4.5	-5.75
0.1586	1.5699	1500	0.6041	-5.6562	-6.875	0.7188	1.2031	-972.0	-880.0	-6.75	-8.0
0.1939	1.6745	1600	0.6094	-5.7812	-6.9375	0.7285	1.1719	-980.0	-892.0	-6.0	-7.2812
0.1753	1.7792	1700	0.6206	-6.4062	-7.6875	0.7148	1.2891	-1056.0	-956.0	-4.7188	-6.0625
0.1609	1.8838	1800	0.6048	-6.0	-7.25	0.7266	1.25	-1012.0	-916.0	-5.3438	-6.6875
0.1532	1.9885	1900	0.6346	-6.9688	-8.375	0.7344	1.4141	-1128.0	-1012.0	-4.75	-6.1562
0.0151	2.0931	2000	0.7192	-8.0625	-9.8125	0.7246	1.7266	-1264.0	-1120.0	-3.6406	-5.25
0.0221	2.1978	2100	0.8640	-10.0625	-12.3125	0.7227	2.25	-1520.0	-1320.0	-2.2188	-3.8906
0.0351	2.3025	2200	0.7923	-8.875	-11.0	0.7246	2.0938	-1384.0	-1200.0	-2.6875	-4.3125
0.017	2.4071	2300	0.8024	-9.125	-11.25	0.7148	2.1094	-1416.0	-1232.0	-2.0312	-3.6562
0.0202	2.5118	2400	0.8169	-9.25	-11.375	0.7090	2.1094	-1424.0	-1240.0	-2.125	-3.8125
0.0122	2.6164	2500	0.8391	-9.75	-11.9375	0.7129	2.2031	-1480.0	-1288.0	-1.7188	-3.375
0.0173	2.7211	2600	0.8294	-9.625	-11.8125	0.7168	2.1875	-1464.0	-1280.0	-1.7891	-3.4375
0.0217	2.8257	2700	0.8316	-9.6875	-11.875	0.7188	2.2031	-1472.0	-1280.0	-1.7578	-3.4062
0.0179	2.9304	2800	0.8334	-9.6875	-11.875	0.7188	2.2031	-1472.0	-1280.0	-1.8125	-3.4688

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 2.21.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-DPO-new-3

OpenELM-1_1B-DPO-new-3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results