Edit model card

zlm-ceb_b64_le5_s8000

This model is a fine-tuned version of mikhail-panzo/zlm_b64_le4_s12000 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4051

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2000
  • training_steps: 8000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.4626 19.6078 500 0.4263
0.4288 39.2157 1000 0.4077
0.4109 58.8235 1500 0.4013
0.3978 78.4314 2000 0.4035
0.3898 98.0392 2500 0.4013
0.373 117.6471 3000 0.4010
0.3644 137.2549 3500 0.4005
0.3569 156.8627 4000 0.4029
0.3515 176.4706 4500 0.4039
0.3443 196.0784 5000 0.4005
0.3469 215.6863 5500 0.4018
0.3427 235.2941 6000 0.4001
0.3401 254.9020 6500 0.4042
0.3419 274.5098 7000 0.4054
0.3318 294.1176 7500 0.4057
0.3312 313.7255 8000 0.4051

Framework versions

  • Transformers 4.41.0.dev0
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
144M params
Tensor type
F32
·

Finetuned from