fresh-8-layer-swag-distill-of-fresh-8-layer-gpqa

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	63	22.7715	0.2677
No log	2.0	126	24.4035	0.2879
No log	3.0	189	21.6171	0.3131
No log	4.0	252	22.9241	0.3333
No log	5.0	315	36.3034	0.3788
No log	6.0	378	22.9598	0.4040
No log	7.0	441	25.2469	0.3485
5.5235	8.0	504	29.2667	0.3687
5.5235	9.0	567	24.0718	0.3687
5.5235	10.0	630	25.5240	0.3030
5.5235	11.0	693	28.6147	0.3283
5.5235	12.0	756	33.3811	0.3434
5.5235	13.0	819	28.3026	0.3232
5.5235	14.0	882	27.7010	0.2677
5.5235	15.0	945	26.9798	0.3182
3.9997	16.0	1008	26.8561	0.3232
3.9997	17.0	1071	25.9683	0.3687
3.9997	18.0	1134	23.6478	0.3333
3.9997	19.0	1197	24.1695	0.3232
3.9997	20.0	1260	24.7100	0.3485