Edit model card

zephyr-7b-sft-full-100ep

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the vipinkatara/SFT_data223 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0012

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
0.474 1.0 23 0.2754
0.0863 2.0 46 0.0708
0.0644 3.0 69 0.0597
0.0582 4.0 92 0.0562
0.0557 5.0 115 0.0542
0.0572 6.0 138 0.0571
0.0569 7.0 161 0.0550
0.0551 8.0 184 0.0540
0.055 9.0 207 0.0530
0.0921 10.0 230 0.4721
0.1954 11.0 253 0.2910
0.0926 12.0 276 0.0766
0.0558 13.0 299 0.0519
0.0538 14.0 322 0.0493
0.0517 15.0 345 0.0479
0.0489 16.0 368 0.0466
0.0466 17.0 391 0.0416
0.0418 18.0 414 0.0349
0.0347 19.0 437 0.0298
0.0304 20.0 460 0.0256
0.0252 21.0 483 0.0192
0.0201 22.0 506 0.0128
0.0128 23.0 529 0.0107
0.0095 24.0 552 0.0054
0.0062 25.0 575 0.0038
0.005 26.0 598 0.0029
0.0038 27.0 621 0.0024
0.0032 28.0 644 0.0022
0.0028 29.0 667 0.0019
0.0026 30.0 690 0.0018
0.0022 31.0 713 0.0016
0.002 32.0 736 0.0015
0.0019 33.0 759 0.0015
0.0018 34.0 782 0.0015
0.0018 35.0 805 0.0014
0.0018 36.0 828 0.0014
0.0017 37.0 851 0.0014
0.0017 38.0 874 0.0014
0.0021 39.0 897 0.0020
0.0023 40.0 920 0.0018
0.0019 41.0 943 0.0017
0.0019 42.0 966 0.0016
0.0018 43.0 989 0.0015
0.0017 44.0 1012 0.0014
0.0017 45.0 1035 0.0014
0.0016 46.0 1058 0.0015
0.0019 47.0 1081 0.0014
0.0017 48.0 1104 0.0015
0.0017 49.0 1127 0.0015
0.0036 50.0 1150 0.0039
0.0029 51.0 1173 0.0031
0.0021 52.0 1196 0.0018
0.0017 53.0 1219 0.0015
0.0017 54.0 1242 0.0014
0.0016 55.0 1265 0.0014
0.0015 56.0 1288 0.0013
0.0015 57.0 1311 0.0013
0.0014 58.0 1334 0.0013
0.0014 59.0 1357 0.0013
0.0014 60.0 1380 0.0013
0.0013 61.0 1403 0.0013
0.0014 62.0 1426 0.0012
0.0013 63.0 1449 0.0012
0.0013 64.0 1472 0.0012
0.0014 65.0 1495 0.0012
0.0013 66.0 1518 0.0012
0.0013 67.0 1541 0.0012
0.0013 68.0 1564 0.0012
0.0014 69.0 1587 0.0012
0.0013 70.0 1610 0.0012
0.0014 71.0 1633 0.0012
0.0014 72.0 1656 0.0012
0.0013 73.0 1679 0.0012
0.0013 74.0 1702 0.0012
0.0013 75.0 1725 0.0012
0.0013 76.0 1748 0.0012
0.0013 77.0 1771 0.0012
0.0012 78.0 1794 0.0012
0.0013 79.0 1817 0.0012
0.0012 80.0 1840 0.0012
0.0013 81.0 1863 0.0012
0.0013 82.0 1886 0.0012
0.0013 83.0 1909 0.0012
0.0012 84.0 1932 0.0012
0.0012 85.0 1955 0.0012
0.0013 86.0 1978 0.0012
0.0012 87.0 2001 0.0012
0.0013 88.0 2024 0.0012
0.0012 89.0 2047 0.0012
0.0013 90.0 2070 0.0012
0.0011 91.0 2093 0.0012
0.0012 92.0 2116 0.0012
0.0012 93.0 2139 0.0012
0.0013 94.0 2162 0.0012
0.0012 95.0 2185 0.0012
0.0013 96.0 2208 0.0012
0.0012 97.0 2231 0.0012
0.0011 98.0 2254 0.0012
0.0012 99.0 2277 0.0012
0.0012 100.0 2300 0.0012

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for vipinkatara/zephyr-7b-sft-full-100ep

Finetuned
this model

Dataset used to train vipinkatara/zephyr-7b-sft-full-100ep