collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9540
Num Input Tokens Seen: 23746304

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.3192	0.0106	5	1.1930	251724
1.2066	0.0211	10	1.0874	499628
1.068	0.0317	15	1.0377	746640
0.7682	0.0423	20	1.0245	998064
0.6356	0.0529	25	1.0272	1247608
0.593	0.0634	30	1.0298	1506576
0.4293	0.0740	35	1.0250	1764228
0.3318	0.0846	40	1.0225	2018200
0.336	0.0952	45	1.0146	2271692
0.3199	0.1057	50	1.0056	2516500
0.268	0.1163	55	1.0010	2771000
0.3191	0.1269	60	0.9958	3020960
0.3025	0.1374	65	0.9967	3270868
0.2459	0.1480	70	0.9926	3522436
0.3354	0.1586	75	0.9893	3769816
0.3724	0.1692	80	0.9870	4023492
0.3067	0.1797	85	0.9857	4280320
0.3403	0.1903	90	0.9824	4533272
0.2916	0.2009	95	0.9814	4782572
0.2843	0.2115	100	0.9797	5032932
0.2819	0.2220	105	0.9773	5288380
0.192	0.2326	110	0.9761	5544116
0.223	0.2432	115	0.9750	5792512
0.2713	0.2538	120	0.9727	6045400
0.2705	0.2643	125	0.9716	6296860
0.3249	0.2749	130	0.9712	6546012
0.2854	0.2855	135	0.9688	6806812
0.242	0.2960	140	0.9702	7059244
0.2796	0.3066	145	0.9696	7303032
0.3008	0.3172	150	0.9680	7552056
0.2189	0.3278	155	0.9691	7807484
0.2304	0.3383	160	0.9697	8044776
0.2107	0.3489	165	0.9691	8297808
0.2989	0.3595	170	0.9663	8540908
0.2247	0.3701	175	0.9648	8800560
0.322	0.3806	180	0.9657	9055232
0.2985	0.3912	185	0.9644	9309612
0.2521	0.4018	190	0.9645	9563664
0.3678	0.4123	195	0.9656	9821076
0.2472	0.4229	200	0.9652	10073776
0.2586	0.4335	205	0.9634	10328452
0.2413	0.4441	210	0.9642	10588044
0.2502	0.4546	215	0.9636	10835888
0.2606	0.4652	220	0.9634	11091212
0.2124	0.4758	225	0.9637	11346152
0.3122	0.4864	230	0.9618	11605376
0.3033	0.4969	235	0.9603	11859096
0.4002	0.5075	240	0.9617	12112920
0.2307	0.5181	245	0.9596	12364040
0.3126	0.5286	250	0.9601	12612780
0.2353	0.5392	255	0.9606	12862168
0.2599	0.5498	260	0.9596	13118060
0.2025	0.5604	265	0.9582	13367704
0.2109	0.5709	270	0.9580	13618324
0.223	0.5815	275	0.9605	13868120
0.2711	0.5921	280	0.9605	14117864
0.2627	0.6027	285	0.9580	14361576
0.299	0.6132	290	0.9576	14613300
0.2192	0.6238	295	0.9569	14863772
0.2936	0.6344	300	0.9580	15117120
0.213	0.6449	305	0.9583	15367888
0.212	0.6555	310	0.9583	15620172
0.207	0.6661	315	0.9586	15867484
0.2712	0.6767	320	0.9580	16119180
0.2482	0.6872	325	0.9565	16372908
0.2093	0.6978	330	0.9549	16621448
0.2663	0.7084	335	0.9566	16873836
0.2744	0.7190	340	0.9569	17124156
0.2421	0.7295	345	0.9559	17371376
0.2775	0.7401	350	0.9555	17618956
0.1681	0.7507	355	0.9553	17871212
0.261	0.7613	360	0.9547	18122512
0.2847	0.7718	365	0.9541	18373000
0.2619	0.7824	370	0.9535	18624220
0.2633	0.7930	375	0.9543	18869156
0.2664	0.8035	380	0.9544	19118420
0.2411	0.8141	385	0.9527	19372484
0.2671	0.8247	390	0.9521	19619032
0.2495	0.8353	395	0.9528	19870712
0.1758	0.8458	400	0.9530	20120812
0.2762	0.8564	405	0.9534	20376396
0.3021	0.8670	410	0.9533	20629920
0.2111	0.8776	415	0.9535	20874064
0.2174	0.8881	420	0.9525	21129004
0.2087	0.8987	425	0.9531	21379724
0.3074	0.9093	430	0.9521	21630104
0.2314	0.9198	435	0.9519	21878192
0.224	0.9304	440	0.9527	22128940
0.2769	0.9410	445	0.9525	22382812
0.2538	0.9516	450	0.9538	22638260
0.2352	0.9621	455	0.9539	22886880
0.3061	0.9727	460	0.9513	23143232
0.2891	0.9833	465	0.9506	23396844
0.252	0.9939	470	0.9551	23650700

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd1

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd1

Evaluation results