smolm-autoreg-bpe-counterfactual_babylm_naans_new-1e-4

This model was trained from scratch on the kanishka/counterfactual_babylm_naans_new dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.0514	1.0	18595	4.2368	0.3093
3.5633	2.0	37190	3.7425	0.3637
3.3923	3.0	55785	3.5711	0.3809
3.2852	4.0	74380	3.5150	0.3880
3.2225	5.0	92975	3.4473	0.3934
3.1717	6.0	111570	3.4466	0.3969
3.128	7.0	130165	3.4203	0.3993
3.0952	8.0	148760	3.3999	0.4015
3.0633	9.0	167355	3.4023	0.4025
3.0408	10.0	185950	3.4020	0.4035
3.0104	11.0	204545	3.3966	0.4037
2.9874	12.0	223140	3.3944	0.4045
2.9712	13.0	241735	3.3882	0.4057
2.9451	14.0	260330	3.3960	0.4058
2.9277	15.0	278925	3.4037	0.4061
2.9085	16.0	297520	3.4048	0.4062
2.8914	17.0	316115	3.4033	0.4061
2.8772	18.0	334710	3.4094	0.4066
2.8635	19.0	353305	3.4112	0.4067
2.8506	20.0	371900	3.4162	0.4067