tinyllama-1.1b-sum-dpo-qlora

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-qlora on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

Loss: 0.6463
Rewards/chosen: -0.9560
Rewards/rejected: -1.1279
Rewards/accuracies: 0.6204
Rewards/margins: 0.1719
Logps/rejected: -187.9012
Logps/chosen: -167.0102
Logits/rejected: -3.0162
Logits/chosen: -3.0224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6927	0.02	100	0.6930	0.0049	0.0047	0.5204	0.0003	-74.6416	-70.9175	-3.4966	-3.4983
0.692	0.03	200	0.6926	0.0146	0.0135	0.5616	0.0012	-73.7585	-69.9458	-3.4924	-3.4942
0.6887	0.05	300	0.6911	0.0351	0.0308	0.5732	0.0043	-72.0302	-67.9024	-3.4858	-3.4876
0.6865	0.07	400	0.6890	0.0164	0.0077	0.5609	0.0087	-74.3370	-69.7677	-3.4786	-3.4805
0.6864	0.09	500	0.6864	0.0236	0.0089	0.5755	0.0146	-74.2129	-69.0538	-3.4662	-3.4680
0.6731	0.1	600	0.6838	0.0019	-0.0189	0.5871	0.0209	-77.0012	-71.2189	-3.4497	-3.4515
0.6749	0.12	700	0.6788	-0.0758	-0.1091	0.5980	0.0333	-86.0178	-78.9945	-3.4470	-3.4489
0.6678	0.14	800	0.6741	-0.1859	-0.2330	0.5906	0.0471	-98.4033	-89.9991	-3.4169	-3.4188
0.6655	0.16	900	0.6709	-0.1856	-0.2411	0.5927	0.0555	-99.2188	-89.9669	-3.3811	-3.3826
0.6695	0.17	1000	0.6686	-0.3893	-0.4584	0.5946	0.0691	-120.9453	-110.3432	-3.3595	-3.3611
0.6648	0.19	1100	0.6702	-0.2078	-0.2671	0.5976	0.0593	-101.8174	-92.1903	-3.3439	-3.3453
0.6543	0.21	1200	0.6642	-0.3511	-0.4313	0.6011	0.0802	-118.2354	-106.5216	-3.3096	-3.3110
0.6535	0.22	1300	0.6605	-0.4651	-0.5609	0.5990	0.0957	-131.1967	-117.9248	-3.2817	-3.2832
0.6315	0.24	1400	0.6606	-0.3801	-0.4704	0.6138	0.0903	-122.1497	-109.4246	-3.2773	-3.2788
0.6595	0.26	1500	0.6544	-0.5561	-0.6712	0.6197	0.1151	-142.2231	-127.0196	-3.2429	-3.2446
0.6383	0.28	1600	0.6538	-0.5868	-0.7052	0.6178	0.1184	-145.6309	-130.0926	-3.2318	-3.2338
0.6775	0.29	1700	0.6568	-0.4687	-0.5717	0.6173	0.1030	-132.2748	-118.2820	-3.2194	-3.2212
0.6312	0.31	1800	0.6497	-0.7203	-0.8617	0.6111	0.1414	-161.2767	-143.4406	-3.1213	-3.1237
0.665	0.33	1900	0.6551	-0.5175	-0.6278	0.6134	0.1103	-137.8867	-123.1614	-3.1660	-3.1680
0.6385	0.34	2000	0.6522	-0.6166	-0.7379	0.6162	0.1213	-148.8959	-133.0700	-3.1823	-3.1845
0.6452	0.36	2100	0.6538	-0.7088	-0.8325	0.6048	0.1237	-158.3535	-142.2912	-3.1344	-3.1369
0.6024	0.38	2200	0.6527	-0.6378	-0.7639	0.6120	0.1262	-151.5019	-135.1858	-3.1567	-3.1596
0.5912	0.4	2300	0.6485	-0.8992	-1.0561	0.6106	0.1569	-180.7164	-161.3302	-3.0812	-3.0853
0.6188	0.41	2400	0.6488	-0.9960	-1.1662	0.6204	0.1702	-191.7268	-171.0100	-3.0219	-3.0276
0.6286	0.43	2500	0.6483	-0.8764	-1.0333	0.6076	0.1568	-178.4354	-159.0542	-3.0428	-3.0475
0.61	0.45	2600	0.6532	-0.7428	-0.8730	0.6018	0.1302	-162.4074	-145.6894	-3.0767	-3.0804
0.6295	0.47	2700	0.6526	-0.6786	-0.8083	0.6138	0.1296	-155.9322	-139.2748	-3.1080	-3.1114
0.6504	0.48	2800	0.6510	-0.7810	-0.9243	0.6106	0.1432	-167.5323	-149.5115	-3.0877	-3.0915
0.6226	0.5	2900	0.6513	-0.7637	-0.9050	0.6127	0.1413	-165.6116	-147.7837	-3.0831	-3.0870
0.6226	0.52	3000	0.6494	-0.7375	-0.8834	0.6078	0.1459	-163.4444	-145.1619	-3.0916	-3.0955
0.6062	0.53	3100	0.6485	-0.7793	-0.9311	0.6129	0.1518	-168.2215	-149.3398	-3.0906	-3.0949
0.6071	0.55	3200	0.6477	-0.8041	-0.9577	0.6118	0.1536	-170.8775	-151.8242	-3.0911	-3.0956
0.608	0.57	3300	0.6461	-1.1115	-1.2974	0.6150	0.1859	-204.8467	-182.5597	-3.0002	-3.0064
0.5996	0.59	3400	0.6486	-0.7960	-0.9481	0.6099	0.1520	-169.9129	-151.0113	-3.0691	-3.0742
0.6081	0.6	3500	0.6478	-0.8354	-0.9930	0.6157	0.1576	-174.4116	-154.9542	-3.0630	-3.0681
0.6256	0.62	3600	0.6491	-0.7744	-0.9234	0.6145	0.1489	-167.4422	-148.8546	-3.0722	-3.0769
0.5969	0.64	3700	0.6469	-0.9732	-1.1419	0.6150	0.1687	-189.2978	-168.7282	-3.0171	-3.0231
0.6272	0.65	3800	0.6472	-0.9477	-1.1124	0.6176	0.1648	-186.3489	-166.1768	-3.0087	-3.0145
0.6222	0.67	3900	0.6467	-0.9719	-1.1400	0.6166	0.1681	-189.1107	-168.6043	-3.0040	-3.0100
0.605	0.69	4000	0.6461	-1.0773	-1.2558	0.6204	0.1785	-200.6857	-179.1379	-2.9783	-2.9849
0.585	0.71	4100	0.6464	-0.9836	-1.1556	0.6164	0.1720	-190.6670	-169.7659	-3.0024	-3.0086
0.6602	0.72	4200	0.6465	-0.9496	-1.1182	0.6178	0.1686	-186.9268	-166.3669	-3.0089	-3.0150
0.6074	0.74	4300	0.6468	-0.8954	-1.0597	0.6183	0.1643	-181.0816	-160.9504	-3.0248	-3.0306
0.6105	0.76	4400	0.6470	-0.8905	-1.0547	0.6150	0.1641	-180.5745	-160.4626	-3.0306	-3.0365
0.6127	0.78	4500	0.6470	-0.8899	-1.0538	0.6183	0.1638	-180.4842	-160.4037	-3.0280	-3.0338
0.5798	0.79	4600	0.6468	-0.9128	-1.0793	0.6208	0.1665	-183.0344	-162.6864	-3.0195	-3.0255
0.6228	0.81	4700	0.6467	-0.9215	-1.0896	0.6192	0.1681	-184.0640	-163.5562	-3.0231	-3.0291
0.6131	0.83	4800	0.6466	-0.9391	-1.1091	0.6199	0.1700	-186.0176	-165.3165	-3.0141	-3.0202
0.6215	0.84	4900	0.6465	-0.9478	-1.1189	0.6197	0.1711	-186.9947	-166.1919	-3.0180	-3.0241
0.585	0.86	5000	0.6460	-0.9592	-1.1321	0.6201	0.1729	-188.3154	-167.3252	-3.0164	-3.0226
0.6478	0.88	5100	0.6460	-0.9606	-1.1336	0.6194	0.1730	-188.4695	-167.4737	-3.0151	-3.0213
0.6018	0.9	5200	0.6462	-0.9572	-1.1296	0.6206	0.1725	-188.0692	-167.1259	-3.0105	-3.0167
0.5963	0.91	5300	0.6465	-0.9564	-1.1282	0.6199	0.1718	-187.9285	-167.0541	-3.0167	-3.0229
0.5921	0.93	5400	0.6462	-0.9569	-1.1292	0.6199	0.1723	-188.0274	-167.0996	-3.0133	-3.0196
0.6015	0.95	5500	0.6463	-0.9570	-1.1292	0.6192	0.1723	-188.0282	-167.1056	-3.0164	-3.0226
0.6148	0.96	5600	0.6461	-0.9543	-1.1269	0.6194	0.1726	-187.7934	-166.8396	-3.0142	-3.0205
0.6299	0.98	5700	0.6462	-0.9543	-1.1263	0.6194	0.1720	-187.7363	-166.8363	-3.0166	-3.0228
0.5854	1.0	5800	0.6463	-0.9560	-1.1279	0.6204	0.1719	-187.9012	-167.0102	-3.0162	-3.0224

Framework versions

PEFT 0.7.1
Transformers 4.39.3
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.15.2

martimfasantos
/

tinyllama-1.1b-sum-dpo-qlora

tinyllama-1.1b-sum-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-qlora

Evaluation results

tinyllama-1.1b-sum-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-qlora

Evaluation results

Adapter for