gemma-3-270m-uzen-base

This model is a fine-tuned version of davron04/gemma-3-270m-uzen-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1987
  • Perplexity: 9.0416

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 256
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_steps: 0.01
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Perplexity
No log 0 0 3.3726 29.0075
152.2381 0.1002 428 2.4411 11.5116
145.3987 0.2003 856 2.3491 10.5032
143.7446 0.3005 1284 2.3286 10.2931
140.7659 0.4006 1712 2.2912 9.9159
139.1574 0.5008 2140 2.2643 9.6535
137.4137 0.6009 2568 2.2431 9.4512
136.3983 0.7011 2996 2.2254 9.2859
135.6059 0.8012 3424 2.2111 9.1541
134.8424 0.9014 3852 2.1987 9.0416

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.8.5
  • Tokenizers 0.22.2
Downloads last month
122
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for davron04/gemma-3-270m-uzen-base

Unable to build the model tree, the base model loops to the model itself. Learn more.