metadata

tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: tinystories_1layer_attn_mlp_C10k_k100
    results: []

tinystories_1layer_attn_mlp_C10k_k100

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy	Multicode K	Mse/layer0	Input Norm/layer0	Output Norm/layer0
2.5072	0.05	500	2.4764	0.4579	1	841.1602	31.9977	4.9114
2.2285	0.1	1000	2.2265	0.4926	1	792.3023	31.9980	7.5524
2.1472	0.16	1500	2.1584	0.5025	1	761.8683	31.9980	8.9239
2.1144	0.21	2000	2.1128	0.5090	1	737.1843	31.9979	9.8992
2.0847	0.26	2500	2.0791	0.5142	1	716.9390	31.9979	10.6577
2.0439	0.31	3000	2.0482	0.5185	1	698.7266	31.9979	11.3599
2.0263	0.37	3500	2.0253	0.5224	1	682.2680	31.9979	12.0105
1.9906	0.42	4000	2.0066	0.5253	1	669.1965	31.9979	12.5568
1.9852	0.47	4500	1.9898	0.5279	1	657.5872	31.9979	13.0526
1.9687	0.52	5000	1.9757	0.5300	1	648.2462	31.9979	13.4496
1.9672	0.57	5500	1.9620	0.5321	1	640.0822	31.9978	13.8078
1.9441	0.63	6000	1.9513	0.5339	1	633.8831	31.9978	14.1018
1.9408	0.68	6500	1.9397	0.5358	1	628.0929	31.9977	14.3550
1.9256	0.73	7000	1.9302	0.5374	1	623.2726	31.9977	14.5534
1.9204	0.78	7500	1.9225	0.5381	1	619.4573	31.9977	14.7258
1.907	0.84	8000	1.9150	0.5393	1	616.4379	31.9976	14.8625
1.8931	0.89	8500	1.9076	0.5408	1	613.7874	31.9976	14.9685
1.9021	0.94	9000	1.9021	0.5417	1	612.0126	31.9975	15.0379
1.8967	0.99	9500	1.8970	0.5426	1	610.6121	31.9975	15.0932
1.8942	1.04	10000	1.8957	0.5429	1	611.1572	31.9975	15.0872