mistral_darulm_20_05_24_part1-2_32000_bpe_full_lr1e4_bs256
This model is a fine-tuned version of RefalMachine/mistral_darulm_20_05_24_part1-2_32000_bpe_mean_init_03_07_24 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.0198
- Accuracy: 0.5685
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 64
- total_train_batch_size: 256
- total_eval_batch_size: 256
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.3529 | 0.04 | 2000 | 2.1464 | 0.5505 |
2.3262 | 0.09 | 4000 | 2.1167 | 0.5540 |
2.2945 | 0.13 | 6000 | 2.1000 | 0.5563 |
2.2961 | 0.18 | 8000 | 2.0909 | 0.5571 |
2.2943 | 0.22 | 10000 | 2.0807 | 0.5588 |
2.2748 | 0.26 | 12000 | 2.0766 | 0.5595 |
2.2741 | 0.31 | 14000 | 2.0678 | 0.5607 |
2.2538 | 0.35 | 16000 | 2.0620 | 0.5620 |
2.2802 | 0.39 | 18000 | 2.0558 | 0.5627 |
2.2613 | 0.44 | 20000 | 2.0485 | 0.5638 |
2.243 | 0.48 | 22000 | 2.0431 | 0.5646 |
2.2438 | 0.53 | 24000 | 2.0381 | 0.5654 |
2.2478 | 0.57 | 26000 | 2.0327 | 0.5664 |
2.2143 | 0.61 | 28000 | 2.0288 | 0.5669 |
2.2207 | 0.66 | 30000 | 2.0255 | 0.5674 |
2.2236 | 0.7 | 32000 | 2.0233 | 0.5679 |
2.2279 | 0.74 | 34000 | 2.0216 | 0.5682 |
2.227 | 0.79 | 36000 | 2.0207 | 0.5684 |
2.2343 | 0.83 | 38000 | 2.0202 | 0.5684 |
2.2226 | 0.88 | 40000 | 2.0199 | 0.5685 |
2.2162 | 0.92 | 42000 | 2.0199 | 0.5685 |
2.2351 | 0.96 | 44000 | 2.0198 | 0.5685 |
Framework versions
- Transformers 4.37.2
- Pytorch 2.3.0a0+6ddf5cf85e.nv24.04
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet.
Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.