collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.5421	0.0206	5	1.3563	284760
1.4213	0.0412	10	1.2364	571568
1.3773	0.0618	15	1.1718	845064
1.2116	0.0824	20	1.1443	1127704
1.1315	0.1030	25	1.1199	1412496
1.1024	0.1236	30	1.1226	1698920
1.0443	0.1441	35	1.1252	1986472
1.0363	0.1647	40	1.1266	2267632
1.0423	0.1853	45	1.1341	2547936
0.9706	0.2059	50	1.1300	2830576
0.9604	0.2265	55	1.1429	3118224
0.9255	0.2471	60	1.1355	3404464
0.9483	0.2677	65	1.1537	3688352
0.8534	0.2883	70	1.1419	3977080
0.8731	0.3089	75	1.1393	4258200
0.8774	0.3295	80	1.1458	4542712
0.8021	0.3501	85	1.1396	4833248
0.7919	0.3707	90	1.1405	5110392
0.765	0.3912	95	1.1369	5394440
0.6146	0.4118	100	1.1466	5677160
0.7264	0.4324	105	1.1348	5959104
0.6176	0.4530	110	1.1390	6236792
0.718	0.4736	115	1.1362	6522184
0.6601	0.4942	120	1.1386	6805272
0.7045	0.5148	125	1.1291	7080584
0.6125	0.5354	130	1.1355	7359048
0.7828	0.5560	135	1.1299	7639800
0.7475	0.5766	140	1.1292	7925000
0.7263	0.5972	145	1.1283	8212784
0.591	0.6178	150	1.1274	8498984
0.6697	0.6384	155	1.1224	8783480
0.6356	0.6589	160	1.1216	9069640
0.6016	0.6795	165	1.1205	9358968
0.5734	0.7001	170	1.1175	9644264
0.5932	0.7207	175	1.1157	9934824
0.5129	0.7413	180	1.1148	10221456
0.6567	0.7619	185	1.1130	10498184
0.6554	0.7825	190	1.1117	10777688
0.5459	0.8031	195	1.1105	11062480
0.6166	0.8237	200	1.1069	11343448
0.6983	0.8443	205	1.1061	11620888
0.5964	0.8649	210	1.1052	11908944
0.5881	0.8855	215	1.1031	12192472
0.5667	0.9060	220	1.1026	12474256
0.5131	0.9266	225	1.1018	12762728
0.5854	0.9472	230	1.0999	13045696
0.6179	0.9678	235	1.1003	13323080
0.5287	0.9884	240	1.0984	13609776