Edit model card

gemma7_on_korean_conv

This model is a fine-tuned version of beomi/gemma-ko-7b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7037

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 5
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 7200

Training results

Training Loss Epoch Step Validation Loss
0.8472 0.1281 200 0.8849
0.7799 0.2563 400 0.8185
0.7791 0.3844 600 0.7945
0.7505 0.5126 800 0.7877
0.7891 0.6407 1000 0.7616
0.689 0.7688 1200 0.7437
0.7612 0.8970 1400 0.7520
0.5183 1.0251 1600 0.8028
0.4562 1.1533 1800 0.7811
0.4584 1.2814 2000 0.7920
0.4535 1.4095 2200 0.7887
0.4268 1.5377 2400 0.8048
0.4368 1.6658 2600 0.7640
0.4435 1.7940 2800 0.7844
0.4327 1.9221 3000 0.7977
0.1711 2.0502 3200 1.0313
0.1856 2.1784 3400 0.9997
0.1812 2.3065 3600 0.9870
0.1876 2.4346 3800 0.9731
0.1927 2.5628 4000 0.9857
0.1964 2.6909 4200 1.0148
0.1948 2.8191 4400 1.0025
0.1865 2.9472 4600 1.0556
0.059 3.0753 4800 1.3127
0.0523 3.2035 5000 1.3947
0.0658 3.3316 5200 1.3980
0.0596 3.4598 5400 1.3785
0.0556 3.5879 5600 1.3936
0.0709 3.7160 5800 1.3858
0.0544 3.8442 6000 1.3943
0.0503 3.9723 6200 1.4319
0.0133 4.1005 6400 1.6485
0.0144 4.2286 6600 1.6932
0.0126 4.3567 6800 1.6980
0.0189 4.4849 7000 1.6962
0.0128 4.6130 7200 1.7037

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for ghost613/gemma7_on_korean_conv

Base model

beomi/gemma-ko-7b
Adapter
this model