output_main

This model is a fine-tuned version of roneneldan/TinyStories-1Layer-21M on the roneneldan/TinyStories dataset. It achieves the following results on the evaluation set:

Loss: 1.6604
Accuracy: 0.5791
Multicode K: 1
Dead Code Fraction/layer0: 0.1982
Mse/layer0: 6073.8637
Input Norm/layer0: 0.7182
Output Norm/layer0: 76.7891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 96
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.05
training_steps: 100000

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Multicode K	Dead Code Fraction/layer0
2.2319	0.1	1000	1.9134	0.5317	1	1.0
1.8521	0.21	2000	1.7990	0.5495	1	1.0
1.7879	0.31	3000	1.7739	0.5557	1	1.0
1.7728	0.42	4000	1.7666	0.5564	1	1.0
1.7686	0.52	5000	1.7609	0.5595	1	1.0
1.7635	0.63	6000	1.7555	0.5598	1	1.0
1.7523	0.73	7000	1.7383	0.5632	1	1.0
1.7471	0.83	8000	1.7368	0.5643	1	1.0
1.7404	0.94	9000	1.7277	0.5659	1	1.0
1.728	1.04	10000	1.7290	0.5647	1	1.0
1.7195	1.15	11000	1.7244	0.5667	1	1.0
1.7198	1.25	12000	1.7230	0.5671	1	1.0
1.7171	1.36	13000	1.7177	0.5689	1	1.0
1.7185	1.46	14000	1.7150	0.5688	1	1.0
1.7149	1.56	15000	1.7125	0.5695	1	1.0
1.7105	1.67	16000	1.7097	0.5695	1	1.0
1.7107	1.77	17000	1.7073	0.5689	1	1.0
1.7113	1.88	18000	1.7025	0.5712	1	1.0
1.7078	1.98	19000	1.7048	0.5702	1	1.0
1.693	2.09	20000	1.7045	0.5696	1	1.0
1.6935	2.19	21000	1.7068	0.5695	1	1.0
1.6962	2.29	22000	1.7046	0.5687	1	1.0
1.6954	2.4	23000	1.7019	0.5706	1	1.0
1.6933	2.5	24000	1.7002	0.5725	1	1.0
1.6942	2.61	25000	1.6983	0.5717	1	1.0
1.6935	2.71	26000	1.6938	0.5730	1	1.0
1.6928	2.82	27000	1.6978	0.5719	1	1.0
1.6927	2.92	28000	1.6935	0.5715	1	1.0
1.6855	3.02	29000	1.6978	0.5726	1	1.0
1.6773	3.13	30000	1.6951	0.5732	1	1.0
1.6788	3.23	31000	1.6926	0.5728	1	1.0
1.6813	3.34	32000	1.6920	0.5726	1	1.0
1.6782	3.44	33000	1.6926	0.5733	1	1.0
1.6801	3.55	34000	1.6894	0.5719	1	1.0
1.6796	3.65	35000	1.6890	0.5728	1	1.0
1.6768	3.75	36000	1.6882	0.5722	1	1.0
1.6802	3.86	37000	1.6872	0.5732	1	1.0
1.6809	3.96	38000	1.6855	0.5750	1	1.0
1.6701	4.07	39000	1.6886	0.5742	1	1.0
1.6646	4.17	40000	1.6890	0.5734	1	1.0
1.669	4.28	41000	1.6859	0.5747	1	1.0
1.6713	4.38	42000	1.6867	0.5740	1	1.0
1.6693	4.48	43000	1.6821	0.5750	1	1.0
1.6693	4.59	44000	1.6822	0.5747	1	1.0
1.6692	4.69	45000	1.6801	0.5745	1	1.0
1.6703	4.8	46000	1.6834	0.5761	1	1.0
1.6677	4.9	47000	1.6819	0.5756	1	1.0
1.6682	5.01	48000	1.6778	0.5752	1	1.0
1.6547	5.11	49000	1.6825	0.5751	1	1.0
1.6566	5.21	50000	1.6825	0.5758	1	1.0
1.6605	5.32	51000	1.6814	0.5746	1	1.0
1.6603	5.42	52000	1.6768	0.5755	1	1.0
1.6595	5.53	53000	1.6757	0.5753	1	1.0
1.6603	5.63	54000	1.6769	0.5738	1	1.0
1.662	5.74	55000	1.6758	0.5759	1	1.0
1.6602	5.84	56000	1.6771	0.5757	1	1.0
1.6624	5.94	57000	1.6749	0.5770	1	1.0
1.6527	6.05	58000	1.6791	0.5758	1	1.0
1.6474	6.15	59000	1.6763	0.5773	1	1.0
1.6494	6.26	60000	1.6765	0.5761	1	1.0
1.6539	6.36	61000	1.6741	0.5764	1	1.0
1.6539	6.47	62000	1.6752	0.5768	1	1.0
1.6529	6.57	63000	1.6737	0.5775	1	1.0
1.6533	6.67	64000	1.6725	0.5758	1	1.0
1.653	6.78	65000	1.6722	0.5774	1	1.0
1.6522	6.88	66000	1.6726	0.5762	1	1.0
1.6528	6.99	67000	1.6726	0.5768	1	1.0
1.6439	7.09	68000	1.6728	0.5771	1	1.0
1.6403	7.19	69000	1.6703	0.5758	1	1.0
1.6447	7.3	70000	1.6697	0.5772	1	1.0
1.6458	7.4	71000	1.6694	0.5777	1	1.0
1.6447	7.51	72000	1.6716	0.5771	1	1.0
1.6449	7.61	73000	1.6680	0.5779	1	1.0
1.6458	7.72	74000	1.6683	0.5779	1	1.0
1.6447	7.82	75000	1.6681	0.5778	1	1.0
1.6451	7.92	76000	1.6677	0.5781	1	1.0
1.6418	8.03	77000	1.6665	0.5789	1	1.0
1.6361	8.13	78000	1.6684	0.5779	1	1.0
1.636	8.24	79000	1.6687	0.5786	1	1.0
1.6357	8.34	80000	1.6670	0.5790	1	1.0
1.6379	8.45	81000	1.6658	0.5788	1	1.0
1.6405	8.55	82000	1.6661	0.5788	1	1.0
1.6378	8.65	83000	1.6650	0.5789	1	1.0
1.6386	8.76	84000	1.6650	0.5784	1	1.0
1.638	8.86	85000	1.6644	0.5785	1	1.0
1.6374	8.97	86000	1.6635	0.5777	1	1.0
1.6298	9.07	87000	1.6647	0.5785	1	1.0
1.6302	9.18	88000	1.6649	0.5787	1	1.0
1.6315	9.28	89000	1.6651	0.5782	1	1.0
1.631	9.38	90000	1.6636	0.5788	1	1.0
1.6316	9.49	91000	1.6627	0.5782	1	1.0
1.6286	9.59	92000	1.6646	0.5783	1	1.0
1.6304	9.7	93000	1.6632	0.5801	1	1.0
1.6298	9.8	94000	1.6623	0.5800	1	1.0
1.6309	9.91	95000	1.6620	0.5800	1	1.0
1.6302	10.01	96000	1.6602	0.5801	1	1.0
1.6242	10.11	97000	1.6610	0.5786	1	1.0
1.6258	10.22	98000	1.6605	0.5795	1	1.0
1.6234	10.32	99000	1.6605	0.5791	1	1.0
1.6245	10.43	100000	1.6604	0.5791	1	1.0

Framework versions

Transformers 4.29.2
Pytorch 2.0.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3

taufeeque
/

TinyStories-1Layer-21M-Codebook

output_main

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train taufeeque/TinyStories-1Layer-21M-Codebook

Evaluation results