llama8b-gsm-real-sftsd1

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0750
Num Input Tokens Seen: 1235796

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 2
eval_batch_size: 2
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.8595	0
1.7608	0.0214	5	1.6700	25930
1.3248	0.0428	10	1.3475	52270
1.2071	0.0642	15	1.2084	79554
1.1995	0.0856	20	1.1763	105102
1.0962	0.1070	25	1.1607	131956
1.1212	0.1284	30	1.1494	158684
1.1985	0.1499	35	1.1423	184480
1.0998	0.1713	40	1.1370	211054
1.1959	0.1927	45	1.1324	236974
1.1464	0.2141	50	1.1279	262912
1.2088	0.2355	55	1.1243	289396
1.0862	0.2569	60	1.1215	316814
1.17	0.2783	65	1.1191	342274
1.079	0.2997	70	1.1173	369198
1.155	0.3211	75	1.1141	396132
1.122	0.3425	80	1.1118	421548
1.0646	0.3639	85	1.1104	449306
1.1247	0.3853	90	1.1071	473942
1.0455	0.4067	95	1.1065	500546
1.1771	0.4282	100	1.1047	525364
1.0121	0.4496	105	1.1031	552868
1.0939	0.4710	110	1.1028	579098
1.133	0.4924	115	1.1005	604876
1.0363	0.5138	120	1.0987	629760
0.9986	0.5352	125	1.0972	657158
1.0632	0.5566	130	1.0968	683064
1.0441	0.5780	135	1.0940	710802
1.0112	0.5994	140	1.0930	737182
1.0467	0.6208	145	1.0914	763298
1.0917	0.6422	150	1.0897	790790
1.0613	0.6636	155	1.0891	818288
0.9827	0.6850	160	1.0883	845282
1.1266	0.7064	165	1.0874	870452
1.0661	0.7279	170	1.0859	896976
1.1039	0.7493	175	1.0852	923846
1.0813	0.7707	180	1.0842	949236
1.0729	0.7921	185	1.0835	977230
1.0617	0.8135	190	1.0838	1003880
1.1071	0.8349	195	1.0825	1029762
1.0408	0.8563	200	1.0810	1057616
1.0801	0.8777	205	1.0799	1084200
1.0656	0.8991	210	1.0786	1110340
1.1181	0.9205	215	1.0787	1136600
0.9485	0.9419	220	1.0782	1164358
1.0608	0.9633	225	1.0772	1192626
1.1137	0.9847	230	1.0755	1219714

Framework versions

Transformers 4.46.0
Pytorch 2.4.1.post300
Datasets 2.20.0
Tokenizers 0.20.1

jkazdan
/

llama8b-gsm-real-sftsd1

llama8b-gsm-real-sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/llama8b-gsm-real-sftsd1

Evaluation results