tinyllama-1.1b-chat-dpo-full

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-chat-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5860
Rewards/chosen: -1.1602
Rewards/rejected: -1.6135
Rewards/accuracies: 0.6890
Rewards/margins: 0.4533
Logps/rejected: -458.4552
Logps/chosen: -452.2377
Logits/rejected: -2.3877
Logits/chosen: -2.4300

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.693	0.0262	100	0.6929	-0.0014	-0.0019	0.5320	0.0006	-297.2994	-336.3557	-3.1228	-3.1361
0.6887	0.0523	200	0.6892	-0.0302	-0.0383	0.6160	0.0081	-300.9348	-339.2341	-3.1215	-3.1346
0.6789	0.0785	300	0.6794	-0.0789	-0.1087	0.6360	0.0299	-307.9798	-344.1051	-3.1094	-3.1216
0.6624	0.1047	400	0.6635	-0.1807	-0.2518	0.6390	0.0711	-322.2854	-354.2890	-3.0664	-3.0771
0.6373	0.1309	500	0.6503	-0.2988	-0.4120	0.6425	0.1133	-338.3080	-366.0959	-2.9693	-2.9839
0.6423	0.1570	600	0.6457	-0.3891	-0.5345	0.6375	0.1454	-350.5518	-375.1291	-2.9372	-2.9538
0.6266	0.1832	700	0.6420	-0.7030	-0.9081	0.6365	0.2051	-387.9123	-406.5211	-2.9095	-2.9229
0.5942	0.2094	800	0.6367	-0.4969	-0.6764	0.6475	0.1795	-364.7484	-385.9118	-2.9255	-2.9397
0.6171	0.2355	900	0.6330	-0.5389	-0.7443	0.6545	0.2054	-371.5351	-390.1065	-2.8815	-2.8992
0.6156	0.2617	1000	0.6271	-0.9278	-1.1788	0.6460	0.2510	-414.9855	-428.9975	-2.8469	-2.8665
0.6636	0.2879	1100	0.6234	-0.7984	-1.0304	0.6515	0.2320	-400.1489	-416.0618	-2.8144	-2.8347
0.6832	0.3141	1200	0.6152	-1.0303	-1.3170	0.6570	0.2866	-428.8004	-439.2536	-2.7994	-2.8212
0.5967	0.3402	1300	0.6131	-1.2342	-1.5321	0.6655	0.2979	-450.3198	-459.6400	-2.7494	-2.7756
0.596	0.3664	1400	0.6064	-0.8587	-1.1697	0.6820	0.3110	-414.0766	-422.0903	-2.8084	-2.8289
0.592	0.3926	1500	0.6027	-0.9689	-1.3189	0.6715	0.3499	-428.9929	-433.1132	-2.7455	-2.7703
0.6353	0.4187	1600	0.6051	-0.9640	-1.3223	0.6745	0.3582	-429.3314	-432.6226	-2.6972	-2.7245
0.6603	0.4449	1700	0.6016	-0.9893	-1.3221	0.6765	0.3328	-429.3145	-435.1521	-2.7021	-2.7305
0.5551	0.4711	1800	0.6023	-1.0035	-1.3765	0.6790	0.3731	-434.7590	-436.5641	-2.6159	-2.6492
0.5877	0.4973	1900	0.5975	-0.8137	-1.1853	0.6835	0.3716	-415.6308	-417.5872	-2.6621	-2.6941
0.5827	0.5234	2000	0.5935	-0.8724	-1.2562	0.6810	0.3838	-422.7221	-423.4575	-2.6043	-2.6396
0.6017	0.5496	2100	0.5911	-1.0065	-1.3971	0.6905	0.3907	-436.8172	-436.8658	-2.6105	-2.6436
0.5539	0.5758	2200	0.5920	-0.9060	-1.2945	0.6885	0.3884	-426.5499	-426.8195	-2.5724	-2.6076
0.5795	0.6019	2300	0.5914	-1.1164	-1.5398	0.6865	0.4234	-451.0841	-447.8605	-2.5399	-2.5757
0.5657	0.6281	2400	0.5904	-1.0347	-1.4494	0.6860	0.4147	-442.0414	-439.6861	-2.5121	-2.5487
0.5306	0.6543	2500	0.5918	-1.0464	-1.4840	0.6825	0.4376	-445.5005	-440.8591	-2.4692	-2.5102
0.5762	0.6805	2600	0.5927	-1.0687	-1.5141	0.6780	0.4455	-448.5193	-443.0862	-2.4291	-2.4735
0.6016	0.7066	2700	0.5936	-1.0767	-1.5080	0.6800	0.4313	-447.9063	-443.8889	-2.4329	-2.4747
0.6068	0.7328	2800	0.5897	-1.1905	-1.6433	0.6820	0.4527	-461.4312	-455.2722	-2.4294	-2.4708
0.5821	0.7590	2900	0.5870	-1.1245	-1.5598	0.6845	0.4353	-453.0833	-448.6697	-2.4470	-2.4862
0.5393	0.7851	3000	0.5873	-1.2223	-1.6710	0.6870	0.4486	-464.2020	-458.4521	-2.4161	-2.4565
0.577	0.8113	3100	0.5886	-1.1359	-1.5757	0.6845	0.4399	-454.6796	-449.8056	-2.4137	-2.4538
0.5731	0.8375	3200	0.5864	-1.1928	-1.6493	0.6900	0.4564	-462.0313	-455.5009	-2.3988	-2.4401
0.586	0.8636	3300	0.5865	-1.1740	-1.6231	0.6895	0.4492	-459.4178	-453.6159	-2.3969	-2.4384
0.5629	0.8898	3400	0.5860	-1.1573	-1.6086	0.6890	0.4513	-457.9694	-451.9486	-2.3882	-2.4306
0.6059	0.9160	3500	0.5858	-1.1672	-1.6213	0.6890	0.4541	-459.2307	-452.9388	-2.3897	-2.4320
0.5703	0.9422	3600	0.5860	-1.1607	-1.6138	0.6870	0.4532	-458.4890	-452.2865	-2.3897	-2.4320
0.5533	0.9683	3700	0.5858	-1.1623	-1.6161	0.6880	0.4538	-458.7165	-452.4510	-2.3882	-2.4304
0.5988	0.9945	3800	0.5862	-1.1608	-1.6138	0.6885	0.4530	-458.4823	-452.2973	-2.3882	-2.4306

Framework versions

Transformers 4.41.1
Pytorch 2.1.2
Datasets 2.19.1
Tokenizers 0.19.1

martimfasantos
/

tinyllama-1.1b-chat-dpo-full

tinyllama-1.1b-chat-dpo-full

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for martimfasantos/tinyllama-1.1b-chat-dpo-full

Dataset used to train martimfasantos/tinyllama-1.1b-chat-dpo-full

Evaluation results