ghost613's picture
gemma2_on_korean_conv-stm
932687f verified
metadata
license: other
library_name: peft
tags:
  - generated_from_trainer
base_model: beomi/gemma-ko-2b
model-index:
  - name: gemma2_on_korean_conv-stm
    results: []

gemma2_on_korean_conv-stm

This model is a fine-tuned version of beomi/gemma-ko-2b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1996

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 10
  • total_train_batch_size: 20
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 2000

Training results

Training Loss Epoch Step Validation Loss
1.4813 0.2563 100 1.4715
1.3177 0.5126 200 1.3092
1.2445 0.7688 300 1.2380
1.0947 1.0251 400 1.1796
0.996 1.2814 500 1.1585
0.9617 1.5377 600 1.1360
0.9645 1.7940 700 1.1112
0.7718 2.0502 800 1.1270
0.7281 2.3065 900 1.1372
0.7437 2.5628 1000 1.1040
0.7588 2.8191 1100 1.0921
0.5759 3.0753 1200 1.1330
0.5811 3.3316 1300 1.1485
0.6025 3.5879 1400 1.1298
0.5766 3.8442 1500 1.1391
0.4555 4.1005 1600 1.1785
0.4426 4.3567 1700 1.1874
0.4461 4.6130 1800 1.1865
0.4506 4.8693 1900 1.1902
0.3731 5.1256 2000 1.1996

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1