Mistral-7B-Instruct-v0.1-dpo-full-1-epoch-hydrox-safe

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0040
Rewards/chosen: 0.1378
Rewards/rejected: -29.0317
Rewards/accuracies: 0.9983
Rewards/margins: 29.1695
Logps/rejected: -714.5497
Logps/chosen: -254.4278
Logits/rejected: -3.3257
Logits/chosen: -3.4722

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.1608	0.03	100	0.1654	1.2374	-2.6089	0.9571	3.8463	-450.3222	-243.4314	-3.2204	-3.2045
0.1349	0.07	200	0.0961	0.9406	-6.3451	0.9756	7.2857	-487.6837	-246.3994	-3.1898	-3.2216
0.1065	0.1	300	0.1015	-0.2203	-9.2710	0.9840	9.0507	-516.9434	-258.0089	-3.1999	-3.2283
0.0876	0.14	400	0.0597	-1.4412	-13.6992	0.9865	12.2580	-561.2250	-270.2174	-3.2066	-3.2753
0.304	0.17	500	0.0874	-0.2677	-17.2497	0.9891	16.9821	-596.7302	-258.4822	-3.2093	-3.2601
0.1206	0.2	600	0.0686	-0.4252	-15.6514	0.9891	15.2262	-580.7473	-260.0578	-3.1689	-3.2024
0.0176	0.24	700	0.0630	-0.7082	-17.5291	0.9933	16.8209	-599.5242	-262.8876	-3.2305	-3.2958
0.0461	0.27	800	0.0341	-1.2542	-21.2558	0.9933	20.0016	-636.7914	-268.3477	-3.3936	-3.5158
0.0185	0.31	900	0.0291	0.3781	-17.2475	0.9966	17.6256	-596.7079	-252.0242	-3.3745	-3.4941
0.0219	0.34	1000	0.0248	-0.1014	-19.6177	0.9958	19.5163	-620.4097	-256.8191	-3.3236	-3.4703
0.0193	0.37	1100	0.0476	0.2441	-22.8685	0.9949	23.1126	-652.9178	-253.3648	-3.3700	-3.5127
0.0153	0.41	1200	0.0344	0.2337	-21.0722	0.9958	21.3059	-634.9553	-253.4690	-3.3281	-3.4433
0.1011	0.44	1300	0.0320	0.3865	-19.5099	0.9941	19.8964	-619.3322	-251.9406	-3.2086	-3.2943
0.0085	0.48	1400	0.0164	-0.3604	-24.6053	0.9958	24.2449	-670.2856	-259.4097	-3.3688	-3.5055
0.0057	0.51	1500	0.0115	-0.8584	-33.7853	0.9966	32.9269	-762.0861	-264.3898	-3.2986	-3.4455
0.0082	0.54	1600	0.0525	-0.3661	-22.4426	0.9975	22.0765	-648.6592	-259.4668	-3.3372	-3.4816
0.0128	0.58	1700	0.0514	-0.4253	-24.3063	0.9958	23.8810	-667.2958	-260.0584	-3.3102	-3.4488
0.0018	0.61	1800	0.0356	-0.3563	-24.1492	0.9966	23.7929	-665.7247	-259.3687	-3.2894	-3.4159
0.0105	0.65	1900	0.0381	-0.9566	-33.8957	0.9958	32.9391	-763.1902	-265.3718	-3.3840	-3.5348
0.006	0.68	2000	0.0072	-0.1403	-26.2483	0.9975	26.1080	-686.7160	-257.2083	-3.3371	-3.4805
0.0026	0.71	2100	0.0102	-0.1870	-29.0470	0.9966	28.8600	-714.7033	-257.6760	-3.3557	-3.4974
0.0038	0.75	2200	0.0078	-0.4803	-29.8773	0.9966	29.3970	-723.0064	-260.6087	-3.3551	-3.5046
0.0011	0.78	2300	0.0075	-0.4771	-28.4348	0.9966	27.9577	-708.5814	-260.5770	-3.3459	-3.4948
0.0033	0.82	2400	0.0047	-0.1998	-28.0030	0.9983	27.8032	-704.2631	-257.8039	-3.3489	-3.4950
0.0051	0.85	2500	0.0048	-0.2771	-29.2358	0.9992	28.9587	-716.5906	-258.5765	-3.3025	-3.4428
0.0074	0.88	2600	0.0044	-0.2089	-29.6486	0.9975	29.4396	-720.7189	-257.8950	-3.3320	-3.4805
0.0032	0.92	2700	0.0041	-0.1675	-30.1791	0.9975	30.0116	-726.0242	-257.4810	-3.3308	-3.4822
0.0023	0.95	2800	0.0038	0.0604	-29.3907	0.9983	29.4511	-718.1400	-255.2013	-3.3267	-3.4751
0.003	0.99	2900	0.0040	0.1446	-28.9793	0.9983	29.1239	-714.0264	-254.3596	-3.3257	-3.4723

Framework versions

Transformers 4.35.0
Pytorch 2.1.1+cu121
Datasets 2.14.6
Tokenizers 0.14.1

yihang7
/

Mistral-7B-Instruct-v0.1-dpo-full-1-epoch-hydrox-safe

Mistral-7B-Instruct-v0.1-dpo-full-1-epoch-hydrox-safe

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yihang7/Mistral-7B-Instruct-v0.1-dpo-full-1-epoch-hydrox-safe

Evaluation results