fresh-2-layer-swag-distill-of-fresh-2-layer-gpqa

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	63	15.4510	0.2576
No log	2.0	126	17.9625	0.3232
No log	3.0	189	15.3798	0.3434
No log	4.0	252	15.4925	0.2929
No log	5.0	315	18.1665	0.3283
No log	6.0	378	19.0829	0.3384
No log	7.0	441	24.6057	0.3737
2.1946	8.0	504	20.6331	0.3333
2.1946	9.0	567	18.3985	0.3283
2.1946	10.0	630	19.1103	0.3535
2.1946	11.0	693	18.6291	0.3636
2.1946	12.0	756	22.3409	0.3333
2.1946	13.0	819	18.9510	0.3434
2.1946	14.0	882	20.9000	0.3485
2.1946	15.0	945	18.1215	0.3384
0.284	16.0	1008	19.2466	0.3434
0.284	17.0	1071	18.9343	0.3384
0.284	18.0	1134	19.4002	0.3586
0.284	19.0	1197	18.9731	0.3535
0.284	20.0	1260	19.2574	0.3636