metadata

base_model: mikhail-panzo/zlm_b128_le4_s12000
tags:
  - generated_from_trainer
model-index:
  - name: zlm-fil_b64_le5_s8000
    results: []

zlm-fil_b64_le5_s8000

This model is a fine-tuned version of mikhail-panzo/zlm_b128_le4_s12000 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4077

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
training_steps: 8000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.5541	21.7391	500	0.4977
0.4931	43.4783	1000	0.4529
0.4695	65.2174	1500	0.4330
0.4518	86.9565	2000	0.4230
0.4442	108.6957	2500	0.4179
0.4344	130.4348	3000	0.4135
0.4318	152.1739	3500	0.4111
0.4201	173.9130	4000	0.4110
0.4185	195.6522	4500	0.4091
0.4153	217.3913	5000	0.4097
0.414	239.1304	5500	0.4069
0.4113	260.8696	6000	0.4080
0.4133	282.6087	6500	0.4073
0.4095	304.3478	7000	0.4059
0.4129	326.0870	7500	0.4083
0.4035	347.8261	8000	0.4077

Framework versions

Transformers 4.41.0.dev0
Pytorch 2.2.1+cu121
Datasets 2.19.0
Tokenizers 0.19.1