metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter4_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.2138
Num Input Tokens Seen: 5073848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5334	0.0511	5	1.2716	258224
1.0476	0.1022	10	1.2402	516160
0.6789	0.1533	15	1.4191	780248
0.4452	0.2043	20	1.5919	1036256
0.2474	0.2554	25	1.7709	1301368
0.1388	0.3065	30	1.9171	1563976
0.1272	0.3576	35	2.0396	1822440
0.0532	0.4087	40	2.1483	2075376
0.051	0.4598	45	2.1510	2339344
0.0471	0.5109	50	2.2012	2612616
0.0398	0.5619	55	2.1902	2869216
0.0414	0.6130	60	2.2098	3123616
0.0326	0.6641	65	2.1951	3390016
0.0353	0.7152	70	2.1734	3658096
0.0301	0.7663	75	2.1824	3926288
0.0286	0.8174	80	2.1903	4184328
0.0316	0.8685	85	2.1851	4447976
0.0291	0.9195	90	2.2216	4705152
0.0276	0.9706	95	2.2168	4966096

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1