zephyr-220m-dpo-full

This model is a fine-tuned version of amazingvince/zephyr-220m-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5608
Rewards/chosen: 0.4691
Rewards/rejected: -0.0455
Rewards/accuracies: 0.6930
Rewards/margins: 0.5145
Logps/rejected: -438.4595
Logps/chosen: -544.6858
Logits/rejected: -4.0092
Logits/chosen: -3.9839

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 2
total_train_batch_size: 16
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6906	0.03	100	0.6932	0.0008	0.0007	0.4860	0.0002	-437.9984	-549.3683	-4.0893	-4.0515
0.6844	0.05	200	0.6855	0.0323	0.0173	0.5640	0.0150	-437.8319	-549.0540	-4.0871	-4.0501
0.6685	0.08	300	0.6675	0.1075	0.0537	0.6160	0.0538	-437.4682	-548.3016	-4.0788	-4.0432
0.6579	0.1	400	0.6426	0.2153	0.0941	0.6430	0.1212	-437.0637	-547.2234	-4.0645	-4.0309
0.6331	0.13	500	0.6241	0.2980	0.1106	0.6430	0.1874	-436.8989	-546.3970	-4.0525	-4.0221
0.6229	0.15	600	0.6138	0.3428	0.1103	0.6580	0.2325	-436.9023	-545.9487	-4.0402	-4.0116
0.6008	0.18	700	0.6053	0.3822	0.0970	0.6560	0.2852	-437.0354	-545.5550	-4.0301	-4.0042
0.5751	0.21	800	0.5998	0.4077	0.0879	0.6540	0.3198	-437.1260	-545.2994	-4.0359	-4.0099
0.6485	0.23	900	0.5922	0.4208	0.0655	0.6600	0.3553	-437.3501	-545.1683	-4.0167	-3.9936
0.6164	0.26	1000	0.5880	0.4046	0.0287	0.6620	0.3759	-437.7182	-545.3309	-4.0092	-3.9869
0.6225	0.28	1100	0.5852	0.4058	0.0110	0.6680	0.3948	-437.8951	-545.3189	-4.0240	-3.9984
0.6289	0.31	1200	0.5824	0.4127	0.0078	0.6670	0.4048	-437.9265	-545.2498	-4.0253	-3.9994
0.5818	0.34	1300	0.5818	0.4222	0.0097	0.6680	0.4125	-437.9080	-545.1544	-4.0212	-3.9953
0.567	0.36	1400	0.5797	0.4098	-0.0141	0.6730	0.4238	-438.1456	-545.2791	-4.0333	-4.0062
0.5659	0.39	1500	0.5790	0.4204	-0.0154	0.6780	0.4358	-438.1591	-545.1725	-4.0245	-3.9963
0.5993	0.41	1600	0.5783	0.4161	-0.0285	0.6720	0.4446	-438.2904	-545.2161	-4.0185	-3.9907
0.5999	0.44	1700	0.5767	0.4067	-0.0468	0.6840	0.4535	-438.4729	-545.3095	-4.0207	-3.9935
0.6004	0.46	1800	0.5731	0.4233	-0.0394	0.6830	0.4627	-438.3991	-545.1437	-4.0219	-3.9944
0.5349	0.49	1900	0.5720	0.4285	-0.0429	0.6830	0.4714	-438.4335	-545.0914	-4.0295	-4.0012
0.5377	0.52	2000	0.5702	0.4255	-0.0540	0.6850	0.4795	-438.5449	-545.1220	-4.0290	-4.0009
0.4988	0.54	2100	0.5713	0.4347	-0.0548	0.6840	0.4895	-438.5533	-545.0299	-4.0317	-4.0039
0.6093	0.57	2200	0.5706	0.4464	-0.0456	0.6810	0.4920	-438.4607	-544.9128	-4.0288	-4.0014
0.5356	0.59	2300	0.5689	0.4484	-0.0486	0.6880	0.4971	-438.4912	-544.8922	-4.0257	-3.9986
0.5753	0.62	2400	0.5681	0.4596	-0.0441	0.6850	0.5037	-438.4457	-544.7802	-4.0100	-3.9846
0.5709	0.65	2500	0.5673	0.4693	-0.0387	0.6910	0.5081	-438.3924	-544.6835	-4.0100	-3.9849
0.5565	0.67	2600	0.5665	0.4692	-0.0401	0.6820	0.5092	-438.4054	-544.6850	-4.0096	-3.9843
0.585	0.7	2700	0.5650	0.4780	-0.0351	0.6940	0.5131	-438.3558	-544.5962	-4.0074	-3.9820
0.5883	0.72	2800	0.5670	0.4914	-0.0151	0.6880	0.5066	-438.1562	-544.4624	-3.9894	-3.9669
0.624	0.75	2900	0.5663	0.4877	-0.0191	0.6840	0.5068	-438.1958	-544.4997	-3.9935	-3.9705
0.5347	0.77	3000	0.5644	0.4757	-0.0335	0.6850	0.5092	-438.3401	-544.6199	-4.0019	-3.9777
0.5837	0.8	3100	0.5637	0.4783	-0.0302	0.6830	0.5085	-438.3073	-544.5936	-3.9976	-3.9742
0.5293	0.83	3200	0.5634	0.4715	-0.0363	0.6890	0.5078	-438.3679	-544.6616	-4.0023	-3.9778
0.5128	0.85	3300	0.5620	0.4745	-0.0387	0.6880	0.5131	-438.3917	-544.6319	-4.0053	-3.9804
0.6204	0.88	3400	0.5625	0.4679	-0.0442	0.6860	0.5121	-438.4469	-544.6978	-4.0067	-3.9815
0.5469	0.9	3500	0.5618	0.4612	-0.0491	0.6860	0.5102	-438.4956	-544.7651	-4.0098	-3.9843
0.5807	0.93	3600	0.5615	0.4675	-0.0454	0.6890	0.5129	-438.4584	-544.7015	-4.0068	-3.9818
0.5265	0.96	3700	0.5620	0.4675	-0.0435	0.6880	0.5110	-438.4403	-544.7019	-4.0082	-3.9833
0.5484	0.98	3800	0.5615	0.4685	-0.0449	0.6930	0.5133	-438.4536	-544.6919	-4.0103	-3.9851

Framework versions

Transformers 4.37.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.15.0
Tokenizers 0.15.0

https://wandb.ai/amazingvince/huggingface/runs/z71h0hc3?workspace=user-amazingvince

BEE-spoke-data
/

zephyr-220m-dpo-full

zephyr-220m-dpo-full

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for BEE-spoke-data/zephyr-220m-dpo-full

Dataset used to train BEE-spoke-data/zephyr-220m-dpo-full

Collection including BEE-spoke-data/zephyr-220m-dpo-full

finetuned smol 220M

Evaluation results