Mistral2_1000_STEPS_01beta_1e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5644
Rewards/chosen: -0.8921
Rewards/rejected: -1.9046
Rewards/accuracies: 0.6330
Rewards/margins: 1.0126
Logps/rejected: -45.6030
Logps/chosen: -32.5922
Logits/rejected: -2.0662
Logits/chosen: -2.0658

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6891	0.0977	50	0.6864	0.0309	0.0171	0.6220	0.0138	-26.3858	-23.3627	-2.3067	-2.3063
0.6478	0.1953	100	0.6468	0.0940	-0.0114	0.6418	0.1054	-26.6706	-22.7312	-2.2655	-2.2651
0.5634	0.2930	150	0.5932	0.1716	-0.1066	0.6593	0.2783	-27.6232	-21.9553	-2.2109	-2.2105
0.4312	0.3906	200	0.5617	-0.0095	-0.5043	0.6396	0.4948	-31.6002	-23.7667	-2.1590	-2.1586
0.4711	0.4883	250	0.5499	-0.0116	-0.6823	0.6462	0.6707	-33.3801	-23.7878	-2.1283	-2.1279
0.4014	0.5859	300	0.5465	-0.4289	-1.1546	0.6484	0.7257	-38.1031	-27.9606	-2.1330	-2.1326
0.4439	0.6836	350	0.5634	-0.7104	-1.5436	0.6462	0.8331	-41.9925	-30.7762	-2.1022	-2.1018
0.4768	0.7812	400	0.5594	-0.6950	-1.5434	0.6571	0.8484	-41.9907	-30.6215	-2.1034	-2.1030
0.4891	0.8789	450	0.5525	-0.7222	-1.5890	0.6505	0.8668	-42.4472	-30.8936	-2.0946	-2.0942
0.4048	0.9766	500	0.5463	-0.4609	-1.3226	0.6571	0.8617	-39.7828	-28.2802	-2.1066	-2.1062
0.3051	1.0742	550	0.5533	-0.7106	-1.6492	0.6440	0.9386	-43.0491	-30.7781	-2.0836	-2.0832
0.3145	1.1719	600	0.5586	-0.8155	-1.7726	0.6330	0.9571	-44.2825	-31.8265	-2.0777	-2.0774
0.4126	1.2695	650	0.5618	-0.8660	-1.8549	0.6374	0.9889	-45.1055	-32.3315	-2.0720	-2.0716
0.3106	1.3672	700	0.5631	-0.8991	-1.8960	0.6308	0.9969	-45.5172	-32.6628	-2.0686	-2.0682
0.3095	1.4648	750	0.5638	-0.8960	-1.9056	0.6308	1.0096	-45.6128	-32.6320	-2.0670	-2.0666
0.3638	1.5625	800	0.5660	-0.8946	-1.9044	0.6352	1.0098	-45.6007	-32.6176	-2.0663	-2.0659
0.348	1.6602	850	0.5645	-0.8960	-1.9094	0.6374	1.0134	-45.6511	-32.6320	-2.0665	-2.0661
0.3272	1.7578	900	0.5653	-0.8971	-1.9081	0.6352	1.0110	-45.6377	-32.6428	-2.0662	-2.0658
0.3261	1.8555	950	0.5644	-0.8920	-1.9045	0.6374	1.0124	-45.6014	-32.5920	-2.0662	-2.0659
0.2913	1.9531	1000	0.5644	-0.8921	-1.9046	0.6330	1.0126	-45.6030	-32.5922	-2.0662	-2.0658

Framework versions

Transformers 4.40.1
Pytorch 2.0.0+cu117
Datasets 2.19.0
Tokenizers 0.19.1

tsavage68
/

Mistral2_1000_STEPS_01beta_1e7rate_CDPOSFT

Mistral2_1000_STEPS_01beta_1e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

Mistral2_1000_STEPS_01beta_1e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from tsavage68/mistralit2_1000_STEPS_5e7_SFT

Evaluation results

Finetuned from