collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9420
Num Input Tokens Seen: 21270888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
3.3744	0.0119	5	1.0898	257008
2.7344	0.0239	10	1.0110	511176
3.0196	0.0358	15	0.9968	766352
2.6511	0.0477	20	0.9826	1021480
2.4604	0.0597	25	0.9840	1268604
2.3963	0.0716	30	0.9900	1514644
2.5123	0.0835	35	0.9893	1773960
2.267	0.0955	40	0.9847	2024128
2.2102	0.1074	45	0.9850	2275396
2.1111	0.1193	50	0.9831	2532996
2.2625	0.1312	55	0.9879	2786708
1.9372	0.1432	60	0.9866	3040624
1.6937	0.1551	65	0.9815	3290344
1.643	0.1670	70	0.9785	3545668
1.6501	0.1790	75	0.9776	3796572
1.533	0.1909	80	0.9773	4049160
1.429	0.2028	85	0.9747	4305488
1.5398	0.2148	90	0.9717	4555888
1.4633	0.2267	95	0.9719	4803416
1.216	0.2386	100	0.9739	5063736
1.2603	0.2506	105	0.9643	5316860
1.5926	0.2625	110	0.9673	5573004
1.3854	0.2744	115	0.9663	5831764
1.4685	0.2864	120	0.9643	6088656
1.2814	0.2983	125	0.9634	6343448
1.4396	0.3102	130	0.9613	6596508
1.2699	0.3221	135	0.9591	6855352
1.2285	0.3341	140	0.9617	7111244
1.2788	0.3460	145	0.9611	7360936
1.3858	0.3579	150	0.9586	7617700
1.2758	0.3699	155	0.9615	7876384
1.2891	0.3818	160	0.9556	8132024
1.3362	0.3937	165	0.9589	8384580
1.306	0.4057	170	0.9557	8634968
1.3192	0.4176	175	0.9574	8888388
1.3276	0.4295	180	0.9537	9137756
1.3805	0.4415	185	0.9530	9392512
1.2827	0.4534	190	0.9540	9652288
1.2674	0.4653	195	0.9523	9908688
1.4104	0.4773	200	0.9512	10160904
1.3507	0.4892	205	0.9547	10411964
1.5425	0.5011	210	0.9498	10663204
1.2436	0.5130	215	0.9523	10910880
1.3822	0.5250	220	0.9495	11163128
1.2537	0.5369	225	0.9531	11415784
1.1275	0.5488	230	0.9494	11669324
1.2746	0.5608	235	0.9499	11928348
1.1185	0.5727	240	0.9482	12186304
1.151	0.5846	245	0.9504	12439048
1.3418	0.5966	250	0.9459	12695300
1.2136	0.6085	255	0.9465	12956336
1.3555	0.6204	260	0.9489	13207856
1.1649	0.6324	265	0.9455	13462604
1.1214	0.6443	270	0.9458	13710788
1.1163	0.6562	275	0.9450	13968176
1.081	0.6682	280	0.9453	14220592
1.1374	0.6801	285	0.9431	14473072
1.4752	0.6920	290	0.9449	14719728
1.2133	0.7040	295	0.9462	14971712
1.185	0.7159	300	0.9434	15223916
1.4205	0.7278	305	0.9459	15484568
1.1185	0.7397	310	0.9448	15737896
1.1153	0.7517	315	0.9441	15994588
1.3097	0.7636	320	0.9413	16249952
1.2363	0.7755	325	0.9454	16503828
1.2772	0.7875	330	0.9407	16758796
1.1471	0.7994	335	0.9428	17009524
1.196	0.8113	340	0.9406	17259112
1.1234	0.8233	345	0.9424	17507920
1.2518	0.8352	350	0.9377	17761472
1.3816	0.8471	355	0.9454	18018912
1.2513	0.8591	360	0.9391	18274396
1.2215	0.8710	365	0.9404	18534984
1.3596	0.8829	370	0.9403	18789204
1.1752	0.8949	375	0.9404	19044632
1.1623	0.9068	380	0.9422	19301532
1.3607	0.9187	385	0.9392	19559488
1.1718	0.9306	390	0.9397	19804384
1.2385	0.9426	395	0.9396	20056360
1.2311	0.9545	400	0.9443	20312576
1.3821	0.9664	405	0.9408	20561528
1.3602	0.9784	410	0.9416	20814848
1.2911	0.9903	415	0.9420	21072348

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd0

collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd0

Evaluation results