fresh-12-layer-swag-distill-of-fresh-12-layer-gpqa

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	63	14.5865	0.2424
No log	2.0	126	13.4274	0.2879
No log	3.0	189	14.5755	0.3434
No log	4.0	252	14.6965	0.3586
No log	5.0	315	14.6065	0.3737
No log	6.0	378	13.0578	0.3737
No log	7.0	441	13.1651	0.3586
1.7518	8.0	504	13.7708	0.3636
1.7518	9.0	567	13.5531	0.3535
1.7518	10.0	630	13.3979	0.3384
1.7518	11.0	693	13.8865	0.3434
1.7518	12.0	756	13.8410	0.3687
1.7518	13.0	819	15.6234	0.3283
1.7518	14.0	882	17.4878	0.3485
1.7518	15.0	945	16.2413	0.3081
2.2378	16.0	1008	14.6003	0.3232
2.2378	17.0	1071	16.5984	0.3232
2.2378	18.0	1134	14.3157	0.3788
2.2378	19.0	1197	13.5424	0.3485
2.2378	20.0	1260	13.3978	0.3586