tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

Loss: 0.7099
Rewards/chosen: -2.8601
Rewards/rejected: -3.4154
Rewards/accuracies: 0.6320
Rewards/margins: 0.5553
Logps/rejected: -404.2897
Logps/chosen: -345.0273
Logits/rejected: -1.9822
Logits/chosen: -2.0068

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.689	0.0689	400	0.6921	0.0010	-0.0011	0.5616	0.0021	-62.8638	-58.9160	-2.9633	-2.9669
0.6822	0.1378	800	0.6861	-0.0503	-0.0663	0.5746	0.0160	-69.3792	-64.0464	-2.9255	-2.9291
0.6737	0.2068	1200	0.6780	-0.2790	-0.3169	0.5762	0.0379	-94.4367	-86.9165	-2.8527	-2.8562
0.6648	0.2757	1600	0.6677	-0.4500	-0.5183	0.6029	0.0683	-114.5829	-104.0142	-2.7578	-2.7612
0.6678	0.3446	2000	0.6576	-0.7094	-0.8175	0.6217	0.1081	-144.4979	-129.9582	-2.6611	-2.6651
0.6253	0.4135	2400	0.6468	-1.0987	-1.2558	0.6236	0.1571	-188.3249	-168.8844	-2.4966	-2.5038
0.6616	0.4824	2800	0.6473	-0.7839	-0.9244	0.6303	0.1405	-155.1877	-137.4051	-2.4668	-2.4737
0.6282	0.5513	3200	0.6395	-1.3763	-1.5943	0.6331	0.2181	-222.1840	-196.6437	-2.2441	-2.2573
0.5886	0.6203	3600	0.6382	-1.2763	-1.4872	0.6355	0.2109	-211.4734	-186.6474	-2.1487	-2.1634
0.5903	0.6892	4000	0.6398	-1.0104	-1.2131	0.6366	0.2027	-184.0546	-160.0534	-2.1888	-2.2035
0.5886	0.7581	4400	0.6349	-1.2844	-1.5732	0.6341	0.2888	-220.0676	-187.4508	-2.0898	-2.1111
0.5907	0.8270	4800	0.6306	-1.3443	-1.6135	0.6478	0.2692	-224.0959	-193.4449	-2.0942	-2.1137
0.5456	0.8959	5200	0.6327	-1.1753	-1.4199	0.6408	0.2446	-204.7423	-176.5441	-2.1214	-2.1394
0.5465	0.9649	5600	0.6325	-1.2769	-1.5500	0.6371	0.2731	-217.7467	-186.7071	-2.0669	-2.0872
0.4632	1.0338	6000	0.6484	-2.1822	-2.6404	0.6496	0.4582	-326.7876	-277.2339	-1.8836	-1.9125
0.4736	1.1027	6400	0.6454	-2.1568	-2.5961	0.6547	0.4393	-322.3579	-274.6943	-1.8531	-1.8794
0.4665	1.1716	6800	0.6386	-1.8958	-2.2728	0.6443	0.3770	-290.0295	-248.5992	-1.8821	-1.9042
0.4789	1.2405	7200	0.6483	-1.9198	-2.2931	0.6403	0.3733	-292.0611	-250.9941	-1.9443	-1.9659
0.5477	1.3094	7600	0.6413	-1.7843	-2.1677	0.6499	0.3834	-279.5165	-237.4425	-1.9622	-1.9845
0.4423	1.3784	8000	0.6528	-2.0003	-2.3620	0.6415	0.3617	-298.9479	-259.0417	-1.9266	-1.9469
0.4668	1.4473	8400	0.6515	-1.8405	-2.1818	0.6403	0.3413	-280.9325	-243.0684	-1.9825	-2.0027
0.509	1.5162	8800	0.6471	-1.9547	-2.3166	0.6424	0.3619	-294.4091	-254.4828	-2.0224	-2.0422
0.4177	1.5851	9200	0.6542	-1.9336	-2.3034	0.6392	0.3699	-293.0923	-252.3707	-1.9854	-2.0064
0.4181	1.6540	9600	0.6626	-2.3352	-2.8057	0.6438	0.4706	-343.3230	-292.5314	-1.9265	-1.9501
0.4469	1.7229	10000	0.6436	-1.8037	-2.1726	0.6431	0.3689	-280.0089	-239.3807	-2.0388	-2.0591
0.4365	1.7919	10400	0.6446	-1.7691	-2.1263	0.6466	0.3572	-275.3837	-235.9303	-2.0443	-2.0637
0.4488	1.8608	10800	0.6558	-2.1203	-2.5393	0.6450	0.4190	-316.6843	-271.0489	-2.0317	-2.0535
0.4611	1.9297	11200	0.6646	-2.4708	-2.9416	0.6468	0.4708	-356.9083	-306.0948	-1.9987	-2.0224
0.4546	1.9986	11600	0.6541	-2.2751	-2.7321	0.6436	0.4570	-335.9583	-286.5284	-1.9967	-2.0195
0.3836	2.0675	12000	0.6827	-2.7558	-3.3214	0.6464	0.5655	-394.8881	-334.6001	-1.9585	-1.9844
0.337	2.1365	12400	0.7083	-3.2136	-3.8269	0.6424	0.6132	-445.4347	-380.3789	-1.9217	-1.9480
0.3756	2.2054	12800	0.6892	-2.5637	-3.0760	0.6378	0.5123	-370.3519	-315.3893	-1.9938	-2.0171
0.4071	2.2743	13200	0.6989	-2.7240	-3.2763	0.6345	0.5523	-390.3795	-331.4143	-1.9810	-2.0059
0.4236	2.3432	13600	0.7127	-2.9174	-3.4982	0.6329	0.5808	-412.5668	-350.7576	-1.9542	-1.9798
0.3527	2.4121	14000	0.7006	-2.6980	-3.2475	0.6252	0.5496	-387.5038	-328.8109	-1.9852	-2.0098
0.3258	2.4810	14400	0.7095	-2.9212	-3.5009	0.6292	0.5798	-412.8438	-351.1316	-1.9581	-1.9835
0.3646	2.5500	14800	0.7041	-2.7281	-3.2711	0.6350	0.5430	-389.8630	-331.8257	-1.9884	-2.0127
0.3596	2.6189	15200	0.7046	-2.7894	-3.3372	0.6359	0.5478	-396.4674	-337.9509	-1.9862	-2.0104
0.3549	2.6878	15600	0.7067	-2.8436	-3.3930	0.6310	0.5494	-402.0518	-343.3737	-1.9841	-2.0084
0.2868	2.7567	16000	0.7117	-2.9064	-3.4673	0.6289	0.5609	-409.4747	-349.6523	-1.9770	-2.0016
0.3243	2.8256	16400	0.7086	-2.8350	-3.3883	0.6320	0.5533	-401.5786	-342.5143	-1.9841	-2.0085
0.3963	2.8946	16800	0.7104	-2.8648	-3.4205	0.6301	0.5558	-404.8014	-345.4919	-1.9835	-2.0081
0.3399	2.9635	17200	0.7095	-2.8594	-3.4153	0.6336	0.5559	-404.2798	-344.9560	-1.9830	-2.0075

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.19.2
Tokenizers 0.19.1

martimfasantos
/

tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs

tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs

Evaluation results

tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from martimfasantos/tinyllama-1.1b-sum-sft-full_old martimfasantos/tinyllama-1.1b-sum-sft-full

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-7_3epochs

Evaluation results

Finetuned from