collapse_gemma-2-27b_hs2_replace_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3653
  • Num Input Tokens Seen: 3955416

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.8489 0.0583 5 1.0535 228936
3.3414 0.1165 10 1.1298 463812
2.8437 0.1748 15 1.1488 702592
1.9341 0.2331 20 1.2179 938224
1.1621 0.2913 25 1.2570 1165920
0.6806 0.3496 30 1.2791 1403276
0.6728 0.4079 35 1.2535 1650592
0.5266 0.4661 40 1.2409 1880524
0.5377 0.5244 45 1.2414 2104356
0.4042 0.5827 50 1.2466 2335700
0.7168 0.6409 55 1.2873 2564852
0.3333 0.6992 60 1.3003 2791324
0.5753 0.7575 65 1.3164 3032688
0.3997 0.8157 70 1.3235 3267132
0.3566 0.8740 75 1.3464 3502604
0.4565 0.9323 80 1.3853 3727432
0.1841 0.9905 85 1.3653 3955416

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_replace_iter3_sftsd0

Base model

google/gemma-2-27b
Finetuned
(33)
this model