metadata

tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: tinystories_1layer_attn_mlp_C25k_k16_mse_weighted
    results: []

tinystories_1layer_attn_mlp_C25k_k16_mse_weighted

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy	Multicode K	Dead Code Fraction/layer0	Mse/layer0	Input Norm/layer0	Output Norm/layer0
2.8364	0.05	500	2.7649	0.4227	1	0.3619	634.8932	31.9979	18.0819
2.3611	0.1	1000	2.3705	0.4712	1	0.3607	568.7264	31.9979	20.6630
2.2395	0.15	1500	2.2531	0.4866	1	0.3266	550.3311	31.9979	21.3297
2.1999	0.2	2000	2.1908	0.4955	1	0.3048	539.0150	31.9980	21.7663
2.1688	0.25	2500	2.1551	0.5006	1	0.2949	530.4651	31.9980	22.0228
2.1108	0.3	3000	2.1269	0.5051	1	0.2809	524.9530	31.9981	22.2071
2.1045	0.35	3500	2.1130	0.5079	1	0.2735	523.0844	31.9982	22.3519
2.0944	0.4	4000	2.0996	0.5089	1	0.2655	519.8852	31.9983	22.3930
2.1314	0.45	4500	2.0860	0.5115	1	0.2567	517.0385	31.9983	22.4720
2.0685	1.02	5000	2.0770	0.5131	1	0.2497	514.3712	31.9984	22.4943
2.0496	1.07	5500	2.0730	0.5137	1	0.2381	513.7823	31.9985	22.5625
2.1002	1.12	6000	2.0667	0.5144	1	0.2305	510.7876	31.9986	22.5882
2.0723	1.17	6500	2.0632	0.5148	1	0.2206	510.5624	31.9986	22.6133
2.023	1.22	7000	2.0574	0.5157	1	0.2110	509.9878	31.9987	22.6544
2.0791	1.27	7500	2.0513	0.5168	1	0.2033	507.1514	31.9987	22.7018
2.0252	1.32	8000	2.0463	0.5173	1	0.1953	505.2723	31.9988	22.7108
2.0432	1.37	8500	2.0423	0.5183	1	0.1875	502.9395	31.9988	22.7562
2.0549	1.42	9000	2.0394	0.5188	1	0.1797	502.9016	31.9988	22.7722
2.0087	1.47	9500	2.0365	0.5193	1	0.1704	504.0088	31.9989	22.7990
2.0569	2.04	10000	2.0353	0.5194	1	0.1640	501.8128	31.9989	22.8009