metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter10_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter10_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.6113
Num Input Tokens Seen: 4842072

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.4765	0.0511	5	1.2806	246008
0.8873	0.1022	10	1.3078	501048
0.5453	0.1533	15	1.4836	748784
0.3639	0.2043	20	1.6678	995320
0.1682	0.2554	25	1.8911	1241960
0.141	0.3065	30	2.1355	1482280
0.1008	0.3576	35	2.2627	1727544
0.0333	0.4087	40	2.3954	1977544
0.027	0.4598	45	2.4283	2229912
0.0236	0.5109	50	2.5144	2487368
0.0233	0.5619	55	2.5364	2733936
0.0236	0.6130	60	2.5347	2984368
0.021	0.6641	65	2.5419	3234792
0.026	0.7152	70	2.5644	3479584
0.0223	0.7663	75	2.5735	3724952
0.0364	0.8174	80	2.5809	3980264
0.0232	0.8685	85	2.5900	4228400
0.0296	0.9195	90	2.5925	4484104
0.023	0.9706	95	2.6062	4735712

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1