reward_model

This model is a fine-tuned version of distilroberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.7043	0.0150	20	0.6886	0.54
0.6671	0.0301	40	0.6924	0.53
0.6131	0.0451	60	0.7038	0.58
0.6149	0.0602	80	0.6759	0.6
0.6539	0.0752	100	0.6593	0.58
0.6671	0.0902	120	0.7227	0.59
0.6863	0.1053	140	0.6452	0.58
0.6332	0.1203	160	0.6394	0.64
0.6259	0.1353	180	0.6630	0.61
0.6257	0.1504	200	0.6369	0.61
0.5376	0.1654	220	0.6460	0.62
0.6734	0.1805	240	0.6404	0.62
0.724	0.1955	260	0.7469	0.6
0.541	0.2105	280	0.6295	0.64
0.5495	0.2256	300	0.6182	0.65
0.7581	0.2406	320	0.6262	0.6
0.5234	0.2556	340	0.6228	0.63
0.5787	0.2707	360	0.6208	0.64
0.6025	0.2857	380	0.6069	0.65
0.6061	0.3008	400	0.6166	0.65
0.8482	0.3158	420	0.6078	0.65
0.5613	0.3308	440	0.5940	0.65
0.7284	0.3459	460	0.6042	0.65
0.5778	0.3609	480	0.5990	0.65
0.6848	0.3759	500	0.5988	0.65