Mistral2_1000_STEPS_03beta_5e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6902
Rewards/chosen: 0.0173
Rewards/rejected: 0.0111
Rewards/accuracies: 0.5099
Rewards/margins: 0.0062
Logps/rejected: -26.5199
Logps/chosen: -23.6139
Logits/rejected: -2.3100
Logits/chosen: -2.3096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-08
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6937	0.0977	50	0.6937	0.0063	0.0071	0.4681	-0.0008	-26.5333	-23.6507	-2.3103	-2.3099
0.6931	0.1953	100	0.6934	0.0068	0.0069	0.4308	-0.0002	-26.5337	-23.6491	-2.3103	-2.3099
0.6945	0.2930	150	0.6927	0.0085	0.0074	0.4527	0.0011	-26.5323	-23.6433	-2.3104	-2.3100
0.6954	0.3906	200	0.6930	0.0105	0.0098	0.4725	0.0006	-26.5240	-23.6368	-2.3103	-2.3099
0.6911	0.4883	250	0.6926	0.0138	0.0124	0.4615	0.0014	-26.5154	-23.6257	-2.3102	-2.3098
0.6878	0.5859	300	0.6919	0.0233	0.0205	0.4681	0.0028	-26.4886	-23.5941	-2.3100	-2.3096
0.6899	0.6836	350	0.6910	0.0119	0.0072	0.5055	0.0047	-26.5329	-23.6321	-2.3097	-2.3093
0.6886	0.7812	400	0.6907	0.0202	0.0149	0.4989	0.0053	-26.5071	-23.6043	-2.3100	-2.3096
0.6927	0.8789	450	0.6915	0.0216	0.0180	0.4725	0.0036	-26.4968	-23.5995	-2.3100	-2.3096
0.6886	0.9766	500	0.6917	0.0198	0.0166	0.4571	0.0032	-26.5016	-23.6056	-2.3102	-2.3097
0.6868	1.0742	550	0.6916	0.0203	0.0167	0.4945	0.0036	-26.5011	-23.6041	-2.3097	-2.3093
0.6862	1.1719	600	0.6911	0.0198	0.0153	0.5033	0.0045	-26.5058	-23.6057	-2.3099	-2.3095
0.6869	1.2695	650	0.6913	0.0210	0.0171	0.5077	0.0039	-26.5000	-23.6017	-2.3098	-2.3094
0.6921	1.3672	700	0.6911	0.0221	0.0177	0.4879	0.0044	-26.4979	-23.5979	-2.3104	-2.3099
0.6883	1.4648	750	0.6916	0.0223	0.0187	0.4791	0.0035	-26.4944	-23.5974	-2.3102	-2.3098
0.6883	1.5625	800	0.6904	0.0184	0.0125	0.5011	0.0059	-26.5152	-23.6104	-2.3100	-2.3096
0.6876	1.6602	850	0.6904	0.0181	0.0123	0.5055	0.0059	-26.5159	-23.6112	-2.3100	-2.3096
0.6845	1.7578	900	0.6902	0.0173	0.0111	0.5099	0.0062	-26.5199	-23.6139	-2.3100	-2.3096
0.6892	1.8555	950	0.6902	0.0173	0.0111	0.5099	0.0062	-26.5199	-23.6139	-2.3100	-2.3096
0.6878	1.9531	1000	0.6902	0.0173	0.0111	0.5099	0.0062	-26.5199	-23.6139	-2.3100	-2.3096

Framework versions

Transformers 4.40.1
Pytorch 2.0.0+cu117
Datasets 2.19.1
Tokenizers 0.19.1

tsavage68
/

Mistral2_1000_STEPS_03beta_5e7rate_CDPOSFT

Mistral2_1000_STEPS_03beta_5e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

Mistral2_1000_STEPS_03beta_5e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from tsavage68/mistralit2_1000_STEPS_5e7_SFT

Evaluation results

Finetuned from