tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

Loss: 0.6307
Rewards/chosen: -1.4504
Rewards/rejected: -1.8097
Rewards/accuracies: 0.6434
Rewards/margins: 0.3593
Logps/rejected: -244.1550
Logps/chosen: -203.7530
Logits/rejected: -1.7026
Logits/chosen: -1.7263

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6931	0.0689	400	0.6932	0.0002	0.0003	0.4654	-0.0001	-63.1542	-58.6924	-3.1574	-3.1630
0.692	0.1378	800	0.6928	0.0015	0.0008	0.5525	0.0007	-63.0955	-58.5586	-3.1518	-3.1574
0.6902	0.2068	1200	0.6914	0.0009	-0.0027	0.5876	0.0037	-63.4527	-58.6187	-3.1281	-3.1338
0.6835	0.2757	1600	0.6888	-0.0225	-0.0320	0.5864	0.0096	-66.3833	-60.9598	-3.0838	-3.0895
0.6778	0.3446	2000	0.6845	-0.0724	-0.0918	0.5976	0.0194	-72.3574	-65.9486	-3.0213	-3.0270
0.6688	0.4135	2400	0.6792	-0.1403	-0.1725	0.6032	0.0323	-80.4345	-72.7375	-2.9370	-2.9428
0.6675	0.4824	2800	0.6732	-0.2283	-0.2756	0.6057	0.0472	-90.7353	-81.5436	-2.8576	-2.8635
0.6437	0.5513	3200	0.6646	-0.3557	-0.4265	0.6120	0.0708	-105.8322	-94.2796	-2.7546	-2.7607
0.6516	0.6203	3600	0.6602	-0.4125	-0.4982	0.6178	0.0856	-112.9954	-99.9643	-2.6547	-2.6612
0.6264	0.6892	4000	0.6514	-0.5858	-0.7050	0.6315	0.1192	-133.6785	-117.2944	-2.5252	-2.5324
0.6109	0.7581	4400	0.6474	-0.6217	-0.7587	0.6313	0.1370	-139.0484	-120.8850	-2.4041	-2.4124
0.6153	0.8270	4800	0.6432	-0.7112	-0.8720	0.6266	0.1608	-150.3814	-129.8305	-2.3206	-2.3302
0.6107	0.8959	5200	0.6407	-0.7470	-0.9249	0.6350	0.1779	-155.6741	-133.4166	-2.2363	-2.2476
0.6061	0.9649	5600	0.6392	-0.7851	-0.9723	0.6315	0.1871	-160.4070	-137.2255	-2.1733	-2.1859
0.5701	1.0338	6000	0.6356	-1.0035	-1.2450	0.6292	0.2415	-187.6758	-159.0581	-2.0122	-2.0292
0.5557	1.1027	6400	0.6358	-1.0296	-1.2785	0.6322	0.2489	-191.0262	-161.6682	-1.9777	-1.9953
0.5292	1.1716	6800	0.6333	-1.0878	-1.3492	0.6313	0.2614	-198.1001	-167.4900	-1.8969	-1.9159
0.5473	1.2405	7200	0.6354	-1.0479	-1.2958	0.6262	0.2479	-192.7597	-163.5001	-1.9044	-1.9226
0.6231	1.3094	7600	0.6346	-1.2184	-1.4979	0.6289	0.2795	-212.9705	-180.5535	-1.8355	-1.8558
0.5403	1.3784	8000	0.6339	-1.1437	-1.4111	0.6264	0.2673	-204.2867	-173.0842	-1.8647	-1.8848
0.5444	1.4473	8400	0.6339	-1.0726	-1.3310	0.6287	0.2584	-196.2827	-165.9765	-1.8568	-1.8768
0.5766	1.5162	8800	0.6329	-1.0364	-1.2879	0.6336	0.2516	-191.9749	-162.3483	-1.8819	-1.9009
0.525	1.5851	9200	0.6320	-1.1870	-1.4611	0.6366	0.2740	-209.2869	-177.4161	-1.8122	-1.8325
0.5174	1.6540	9600	0.6310	-1.2662	-1.5606	0.6375	0.2944	-219.2438	-185.3348	-1.7597	-1.7810
0.5312	1.7229	10000	0.6313	-1.2979	-1.6013	0.6359	0.3033	-223.3081	-188.5056	-1.7629	-1.7848
0.4923	1.7919	10400	0.6312	-1.1596	-1.4412	0.6334	0.2815	-207.2955	-174.6746	-1.7754	-1.7966
0.5386	1.8608	10800	0.6304	-1.2706	-1.5735	0.6373	0.3029	-220.5279	-185.7685	-1.7500	-1.7722
0.5178	1.9297	11200	0.6295	-1.2859	-1.6008	0.6443	0.3149	-223.2599	-187.3036	-1.7272	-1.7501
0.5556	1.9986	11600	0.6295	-1.2652	-1.5714	0.6362	0.3062	-220.3214	-185.2294	-1.7356	-1.7580
0.4901	2.0675	12000	0.6303	-1.4749	-1.8246	0.6447	0.3497	-245.6420	-206.2009	-1.6688	-1.6928
0.4713	2.1365	12400	0.6303	-1.6230	-2.0017	0.6471	0.3786	-263.3478	-221.0147	-1.6397	-1.6644
0.5188	2.2054	12800	0.6305	-1.4593	-1.8052	0.6408	0.3458	-243.6979	-204.6454	-1.6776	-1.7011
0.5395	2.2743	13200	0.6315	-1.5373	-1.9051	0.6429	0.3678	-253.6892	-212.4377	-1.6591	-1.6834
0.5059	2.3432	13600	0.6318	-1.4799	-1.8381	0.6431	0.3582	-246.9884	-206.6992	-1.6812	-1.7051
0.4543	2.4121	14000	0.6318	-1.3717	-1.7109	0.6459	0.3392	-234.2693	-195.8793	-1.7134	-1.7366
0.5121	2.4810	14400	0.6308	-1.4206	-1.7736	0.6447	0.3530	-240.5389	-200.7700	-1.7016	-1.7252
0.4847	2.5500	14800	0.6304	-1.4817	-1.8498	0.6443	0.3681	-248.1589	-206.8796	-1.6912	-1.7153
0.4701	2.6189	15200	0.6306	-1.4145	-1.7659	0.6445	0.3514	-239.7732	-200.1665	-1.7090	-1.7324
0.5011	2.6878	15600	0.6304	-1.4080	-1.7575	0.6434	0.3495	-238.9349	-199.5119	-1.7135	-1.7369
0.4936	2.7567	16000	0.6304	-1.4490	-1.8088	0.6436	0.3598	-244.0595	-203.6143	-1.7010	-1.7248
0.4952	2.8256	16400	0.6312	-1.4483	-1.8060	0.6438	0.3577	-243.7794	-203.5389	-1.7043	-1.7279
0.5024	2.8946	16800	0.6304	-1.4492	-1.8094	0.6429	0.3602	-244.1201	-203.6308	-1.7037	-1.7274
0.5054	2.9635	17200	0.6303	-1.4484	-1.8080	0.6436	0.3596	-243.9776	-203.5508	-1.7024	-1.7262

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.19.2
Tokenizers 0.19.1

martimfasantos
/

tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old

tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old

Evaluation results

tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from martimfasantos/tinyllama-1.1b-sum-sft-full_old

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old

Evaluation results

Finetuned from