collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0907
Num Input Tokens Seen: 15769592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5428	0.0181	5	1.3546	287760
1.4491	0.0363	10	1.2428	573400
1.3019	0.0544	15	1.1801	857448
1.1411	0.0725	20	1.1595	1149600
1.1571	0.0907	25	1.1375	1436240
0.9624	0.1088	30	1.1437	1719440
0.9695	0.1270	35	1.1431	2009848
0.8998	0.1451	40	1.1629	2294176
0.8159	0.1632	45	1.1511	2583312
0.7574	0.1814	50	1.1571	2871264
0.779	0.1995	55	1.1611	3152816
0.7257	0.2176	60	1.1561	3436104
0.6731	0.2358	65	1.1555	3727224
0.6363	0.2539	70	1.1421	4018624
0.7335	0.2720	75	1.1438	4302752
0.6083	0.2902	80	1.1426	4586272
0.5381	0.3083	85	1.1403	4871408
0.6245	0.3265	90	1.1329	5161168
0.5615	0.3446	95	1.1357	5445744
0.6186	0.3627	100	1.1283	5732440
0.6334	0.3809	105	1.1311	6010696
0.6243	0.3990	110	1.1259	6302592
0.4819	0.4171	115	1.1253	6590264
0.6061	0.4353	120	1.1222	6871720
0.5721	0.4534	125	1.1224	7161672
0.5233	0.4715	130	1.1187	7445560
0.5879	0.4897	135	1.1184	7731240
0.6161	0.5078	140	1.1148	8022128
0.5413	0.5260	145	1.1141	8311128
0.6638	0.5441	150	1.1113	8599464
0.5392	0.5622	155	1.1110	8882912
0.544	0.5804	160	1.1090	9170720
0.4453	0.5985	165	1.1078	9462296
0.6199	0.6166	170	1.1070	9744376
0.4516	0.6348	175	1.1062	10034944
0.4961	0.6529	180	1.1056	10319088
0.4468	0.6710	185	1.1047	10604352
0.53	0.6892	190	1.1032	10894032
0.4841	0.7073	195	1.1033	11183504
0.3908	0.7255	200	1.1012	11472104
0.562	0.7436	205	1.1016	11760408
0.5476	0.7617	210	1.0987	12042136
0.6008	0.7799	215	1.0985	12330008
0.5593	0.7980	220	1.0963	12618392
0.4895	0.8161	225	1.0967	12904064
0.5783	0.8343	230	1.0968	13192384
0.4983	0.8524	235	1.0945	13481168
0.6185	0.8706	240	1.0926	13771296
0.5053	0.8887	245	1.0932	14059504
0.5093	0.9068	250	1.0932	14347376
0.5297	0.9250	255	1.0926	14625616
0.4765	0.9431	260	1.0919	14910888
0.4637	0.9612	265	1.0927	15198944
0.3962	0.9794	270	1.0896	15487232
0.3994	0.9975	275	1.0907	15769592

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

Evaluation results