Visualize in Weights & Biases

mistral-7b-sft-25-1

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the hZzy/SFT_new_mix_full2 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5649

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 48
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.4019 0.0299 5 2.4203
2.3288 0.0597 10 2.1392
2.0023 0.0896 15 1.8836
1.8366 0.1194 20 1.7753
1.7521 0.1493 25 1.7129
1.6941 0.1791 30 1.6754
1.6516 0.2090 35 1.6462
1.6219 0.2388 40 1.6239
1.6066 0.2687 45 1.6098
1.5845 0.2985 50 1.5998
1.5851 0.3284 55 1.5923
1.571 0.3582 60 1.5869
1.5676 0.3881 65 1.5824
1.5602 0.4179 70 1.5788
1.5427 0.4478 75 1.5761
1.5378 0.4776 80 1.5741
1.5366 0.5075 85 1.5723
1.534 0.5373 90 1.5706
1.525 0.5672 95 1.5695
1.5238 0.5970 100 1.5685
1.5254 0.6269 105 1.5680
1.5209 0.6567 110 1.5672
1.5102 0.6866 115 1.5666
1.5199 0.7164 120 1.5660
1.4967 0.7463 125 1.5660
1.5206 0.7761 130 1.5656
1.5152 0.8060 135 1.5656
1.5008 0.8358 140 1.5651
1.5092 0.8657 145 1.5652
1.5064 0.8955 150 1.5652
1.5011 0.9254 155 1.5650
1.5089 0.9552 160 1.5649
1.5137 0.9851 165 1.5649

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
956
Safetensors
Model size
7.25B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hZzy/mistral-7b-sft-25-1

Finetuned
(150)
this model
Adapters
21 models
Finetunes
3 models

Dataset used to train hZzy/mistral-7b-sft-25-1