Edit model card

gemma2_on_korean_conv-stm

This model is a fine-tuned version of beomi/gemma-ko-2b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1996

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 10
  • total_train_batch_size: 20
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 2000

Training results

Training Loss Epoch Step Validation Loss
1.4813 0.2563 100 1.4715
1.3177 0.5126 200 1.3092
1.2445 0.7688 300 1.2380
1.0947 1.0251 400 1.1796
0.996 1.2814 500 1.1585
0.9617 1.5377 600 1.1360
0.9645 1.7940 700 1.1112
0.7718 2.0502 800 1.1270
0.7281 2.3065 900 1.1372
0.7437 2.5628 1000 1.1040
0.7588 2.8191 1100 1.0921
0.5759 3.0753 1200 1.1330
0.5811 3.3316 1300 1.1485
0.6025 3.5879 1400 1.1298
0.5766 3.8442 1500 1.1391
0.4555 4.1005 1600 1.1785
0.4426 4.3567 1700 1.1874
0.4461 4.6130 1800 1.1865
0.4506 4.8693 1900 1.1902
0.3731 5.1256 2000 1.1996

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
14
Unable to determine this model’s pipeline type. Check the docs .

Adapter for