Edit model card

zlm-fil_b64_le5_s8000

This model is a fine-tuned version of mikhail-panzo/zlm_b128_le4_s12000 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4077

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2000
  • training_steps: 8000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5541 21.7391 500 0.4977
0.4931 43.4783 1000 0.4529
0.4695 65.2174 1500 0.4330
0.4518 86.9565 2000 0.4230
0.4442 108.6957 2500 0.4179
0.4344 130.4348 3000 0.4135
0.4318 152.1739 3500 0.4111
0.4201 173.9130 4000 0.4110
0.4185 195.6522 4500 0.4091
0.4153 217.3913 5000 0.4097
0.414 239.1304 5500 0.4069
0.4113 260.8696 6000 0.4080
0.4133 282.6087 6500 0.4073
0.4095 304.3478 7000 0.4059
0.4129 326.0870 7500 0.4083
0.4035 347.8261 8000 0.4077

Framework versions

  • Transformers 4.41.0.dev0
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
144M params
Tensor type
F32
·

Finetuned from