mikhail-panzo's picture
End of training
2831b6f verified
|
raw
history blame
No virus
2.3 kB
metadata
license: mit
base_model: mikhail-panzo/zlm_b64_le4_s12000
tags:
  - generated_from_trainer
model-index:
  - name: zlm-fil_b64_le5_s8000
    results: []

zlm-fil_b64_le5_s8000

This model is a fine-tuned version of mikhail-panzo/zlm_b64_le4_s12000 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4118

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2000
  • training_steps: 8000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5529 22.2222 500 0.5000
0.4974 44.4444 1000 0.4557
0.4716 66.6667 1500 0.4359
0.453 88.8889 2000 0.4246
0.4428 111.1111 2500 0.4196
0.4332 133.3333 3000 0.4171
0.4246 155.5556 3500 0.4154
0.4202 177.7778 4000 0.4133
0.4223 200.0 4500 0.4145
0.4127 222.2222 5000 0.4118
0.418 244.4444 5500 0.4130
0.4137 266.6667 6000 0.4130
0.4105 288.8889 6500 0.4127
0.4164 311.1111 7000 0.4127
0.4088 333.3333 7500 0.4120
0.4028 355.5556 8000 0.4118

Framework versions

  • Transformers 4.41.0.dev0
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1