Mistral2_1000_STEPS_05beta_1e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.3136
Rewards/chosen: -2.3904
Rewards/rejected: -7.1332
Rewards/accuracies: 0.6593
Rewards/margins: 4.7427
Logps/rejected: -40.8232
Logps/chosen: -28.4526
Logits/rejected: -1.9252
Logits/chosen: -1.9252

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5792	0.0977	50	0.5909	-2.0186	-2.6003	0.6220	0.5817	-31.7574	-27.7089	-2.2531	-2.2527
0.6303	0.1953	100	0.6667	3.3013	1.7992	0.6549	1.5021	-22.9585	-17.0691	-2.2458	-2.2455
0.5825	0.2930	150	0.7920	2.4778	0.7375	0.6505	1.7403	-25.0818	-18.7161	-2.1900	-2.1897
0.6114	0.3906	200	0.7379	2.2312	0.7238	0.6659	1.5074	-25.1093	-19.2092	-2.3138	-2.3135
0.6456	0.4883	250	0.8073	3.3802	1.8907	0.6220	1.4894	-22.7754	-16.9114	-2.1555	-2.1552
0.6342	0.5859	300	0.8059	3.1536	1.5241	0.6286	1.6295	-23.5086	-17.3644	-2.2658	-2.2655
0.6242	0.6836	350	0.8249	1.4081	-0.6396	0.6659	2.0477	-27.8361	-20.8555	-2.2305	-2.2303
0.7214	0.7812	400	0.8283	2.4761	0.6640	0.6418	1.8121	-25.2289	-18.7195	-2.3316	-2.3314
0.7045	0.8789	450	0.8201	1.8174	-0.1276	0.6352	1.9451	-26.8121	-20.0369	-2.1939	-2.1937
0.479	0.9766	500	0.7489	2.6325	1.0003	0.6593	1.6322	-24.5563	-18.4067	-2.3131	-2.3129
0.0869	1.0742	550	0.9388	0.3435	-2.9890	0.6681	3.3325	-32.5349	-22.9847	-2.0092	-2.0092
0.2298	1.1719	600	1.1052	-0.7335	-4.5697	0.6593	3.8362	-35.6963	-25.1386	-1.9647	-1.9647
0.2182	1.2695	650	1.2321	-1.9830	-6.2540	0.6593	4.2711	-39.0649	-27.6376	-1.9426	-1.9426
0.0774	1.3672	700	1.2775	-2.3773	-6.9288	0.6615	4.5515	-40.4144	-28.4262	-1.9328	-1.9328
0.1026	1.4648	750	1.3159	-2.4992	-7.2166	0.6615	4.7174	-40.9900	-28.6701	-1.9244	-1.9244
0.0987	1.5625	800	1.3118	-2.4534	-7.2109	0.6593	4.7575	-40.9786	-28.5784	-1.9248	-1.9248
0.2393	1.6602	850	1.3108	-2.3855	-7.1139	0.6637	4.7283	-40.7846	-28.4428	-1.9255	-1.9255
0.2495	1.7578	900	1.3100	-2.3926	-7.1330	0.6637	4.7404	-40.8229	-28.4569	-1.9264	-1.9264
0.1851	1.8555	950	1.3120	-2.4001	-7.1405	0.6637	4.7404	-40.8378	-28.4718	-1.9253	-1.9253
0.0934	1.9531	1000	1.3136	-2.3904	-7.1332	0.6593	4.7427	-40.8232	-28.4526	-1.9252	-1.9252

Framework versions

Transformers 4.40.1
Pytorch 2.0.0+cu117
Datasets 2.19.0
Tokenizers 0.19.1

tsavage68
/

Mistral2_1000_STEPS_05beta_5e7rate_CDPOSFT

Mistral2_1000_STEPS_05beta_1e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

Mistral2_1000_STEPS_05beta_1e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from tsavage68/mistralit2_1000_STEPS_5e7_SFT

Evaluation results

Finetuned from