Mistral2_1000_STEPS_01beta_1e8rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6926
Rewards/chosen: 0.0065
Rewards/rejected: 0.0053
Rewards/accuracies: 0.4615
Rewards/margins: 0.0012
Logps/rejected: -26.5038
Logps/chosen: -23.6067
Logits/rejected: -2.3100
Logits/chosen: -2.3095

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-08
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6942	0.0977	50	0.6931	0.0016	0.0015	0.4549	0.0001	-26.5417	-23.6558	-2.3103	-2.3098
0.6924	0.1953	100	0.6933	0.0004	0.0006	0.4352	-0.0002	-26.5508	-23.6681	-2.3107	-2.3103
0.6936	0.2930	150	0.6931	0.0013	0.0013	0.4527	0.0001	-26.5442	-23.6585	-2.3106	-2.3102
0.6934	0.3906	200	0.6925	0.0034	0.0021	0.4791	0.0013	-26.5358	-23.6374	-2.3104	-2.3099
0.6923	0.4883	250	0.6928	0.0053	0.0044	0.4967	0.0008	-26.5125	-23.6191	-2.3102	-2.3098
0.6914	0.5859	300	0.6924	0.0058	0.0043	0.4879	0.0015	-26.5142	-23.6138	-2.3102	-2.3098
0.6922	0.6836	350	0.6926	0.0072	0.0059	0.4923	0.0012	-26.4974	-23.6001	-2.3104	-2.3099
0.6913	0.7812	400	0.6924	0.0048	0.0034	0.4945	0.0015	-26.5233	-23.6235	-2.3098	-2.3094
0.6917	0.8789	450	0.6923	0.0058	0.0041	0.5011	0.0017	-26.5157	-23.6136	-2.3100	-2.3096
0.6909	0.9766	500	0.6925	0.0052	0.0038	0.4813	0.0014	-26.5186	-23.6196	-2.3101	-2.3097
0.6906	1.0742	550	0.6925	0.0073	0.0059	0.4989	0.0013	-26.4974	-23.5988	-2.3100	-2.3096
0.692	1.1719	600	0.6925	0.0063	0.0049	0.5033	0.0014	-26.5080	-23.6092	-2.3099	-2.3095
0.6918	1.2695	650	0.6924	0.0055	0.0041	0.4857	0.0015	-26.5160	-23.6163	-2.3099	-2.3095
0.6918	1.3672	700	0.6923	0.0066	0.0048	0.5165	0.0018	-26.5093	-23.6059	-2.3100	-2.3096
0.6915	1.4648	750	0.6921	0.0078	0.0057	0.5121	0.0022	-26.5002	-23.5933	-2.3100	-2.3096
0.6917	1.5625	800	0.6923	0.0070	0.0053	0.4901	0.0017	-26.5038	-23.6016	-2.3099	-2.3095
0.692	1.6602	850	0.6926	0.0068	0.0057	0.4813	0.0012	-26.5000	-23.6033	-2.3099	-2.3094
0.6913	1.7578	900	0.6926	0.0065	0.0053	0.4615	0.0012	-26.5038	-23.6067	-2.3100	-2.3095
0.6917	1.8555	950	0.6926	0.0065	0.0053	0.4615	0.0012	-26.5038	-23.6067	-2.3100	-2.3095
0.6911	1.9531	1000	0.6926	0.0065	0.0053	0.4615	0.0012	-26.5038	-23.6067	-2.3100	-2.3095

Framework versions

Transformers 4.40.1
Pytorch 2.0.0+cu117
Datasets 2.19.0
Tokenizers 0.19.1

tsavage68
/

Mistral2_1000_STEPS_01beta_1e8rate_CDPOSFT

Mistral2_1000_STEPS_01beta_1e8rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/Mistral2_1000_STEPS_01beta_1e8rate_CDPOSFT

Evaluation results