metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: gemma-2-2b_hs2_iter1_sftsd2
    results: []

gemma-2-2b_hs2_iter1_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.2172
Num Input Tokens Seen: 18470688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.8375	0.0151	5	1.3784	277040
1.5937	0.0301	10	1.2687	554320
1.5082	0.0452	15	1.1925	832568
1.3528	0.0602	20	1.1570	1104816
1.29	0.0753	25	1.1362	1377136
1.2141	0.0903	30	1.1415	1648280
1.0916	0.1054	35	1.1517	1928952
1.0637	0.1205	40	1.1848	2205568
0.997	0.1355	45	1.2021	2486744
0.8411	0.1506	50	1.2454	2759672
0.819	0.1656	55	1.2625	3034344
0.8372	0.1807	60	1.2813	3310160
0.7501	0.1957	65	1.3245	3591528
0.701	0.2108	70	1.3285	3867064
0.6381	0.2259	75	1.3442	4136080
0.5853	0.2409	80	1.3674	4413080
0.5914	0.2560	85	1.3762	4697248
0.539	0.2710	90	1.3602	4976440
0.5163	0.2861	95	1.3418	5258848
0.3974	0.3011	100	1.3244	5530232
0.415	0.3162	105	1.3646	5806632
0.3812	0.3313	110	1.3175	6085304
0.3926	0.3463	115	1.3466	6366392
0.3356	0.3614	120	1.3194	6645272
0.3933	0.3764	125	1.3229	6933352
0.3463	0.3915	130	1.3271	7209752
0.3245	0.4065	135	1.3134	7487224
0.3898	0.4216	140	1.3007	7763992
0.238	0.4367	145	1.3160	8052304
0.3031	0.4517	150	1.3038	8323880
0.363	0.4668	155	1.3004	8594840
0.3207	0.4818	160	1.2812	8877704
0.2837	0.4969	165	1.2827	9158496
0.1469	0.5120	170	1.2875	9437080
0.2441	0.5270	175	1.2807	9715752
0.2553	0.5421	180	1.2806	9997688
0.2823	0.5571	185	1.2647	10279272
0.2381	0.5722	190	1.2680	10555816
0.2152	0.5872	195	1.2607	10829488
0.2018	0.6023	200	1.2581	11107824
0.2278	0.6174	205	1.2819	11388528
0.2623	0.6324	210	1.2529	11675728
0.2305	0.6475	215	1.2584	11954704
0.1346	0.6625	220	1.2531	12227408
0.2306	0.6776	225	1.2524	12509728
0.2329	0.6926	230	1.2434	12789144
0.1821	0.7077	235	1.2447	13064784
0.238	0.7228	240	1.2315	13335048
0.2227	0.7378	245	1.2391	13612832
0.2414	0.7529	250	1.2377	13892512
0.1753	0.7679	255	1.2327	14174312
0.2232	0.7830	260	1.2354	14454112
0.209	0.7980	265	1.2343	14724840
0.1725	0.8131	270	1.2314	15000280
0.1442	0.8282	275	1.2273	15282784
0.2197	0.8432	280	1.2237	15556416
0.2327	0.8583	285	1.2239	15842432
0.233	0.8733	290	1.2274	16119456
0.2136	0.8884	295	1.2228	16398960
0.1161	0.9034	300	1.2295	16675056
0.1408	0.9185	305	1.2214	16956240
0.2016	0.9336	310	1.2247	17235632
0.2294	0.9486	315	1.2298	17515584
0.1335	0.9637	320	1.2145	17798760
0.1811	0.9787	325	1.2251	18075960
0.2033	0.9938	330	1.2213	18358176

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1