Edit model card

final_model_5

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8725

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: bfloat16

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 90

Training results

Training Loss Epoch Step Validation Loss
0.0462 1.0 1 2.5821
0.0463 2.0 2 2.6255
0.0327 3.0 3 2.7177
0.0374 4.0 4 2.7702
0.0465 5.0 5 2.7528
0.029 6.0 6 2.7269
0.0239 7.0 7 2.6977
0.0284 8.0 8 2.6762
0.019 9.0 9 2.6788
0.0184 10.0 10 2.6653
0.0283 11.0 11 2.6582
0.0232 12.0 12 2.6511
0.0161 13.0 13 2.6508
0.0158 14.0 14 2.6450
0.0147 15.0 15 2.6431
0.0156 16.0 16 2.6449
0.014 17.0 17 2.6488
0.0139 18.0 18 2.6530
0.0137 19.0 19 2.6587
0.0136 20.0 20 2.6646
0.0135 21.0 21 2.6703
0.0134 22.0 22 2.6755
0.0133 23.0 23 2.6806
0.0131 24.0 24 2.6858
0.0131 25.0 25 2.6908
0.0129 26.0 26 2.6956
0.0128 27.0 27 2.7001
0.0127 28.0 28 2.7043
0.0125 29.0 29 2.7083
0.0123 30.0 30 2.7120
0.0121 31.0 31 2.7155
0.0121 32.0 32 2.7191
0.0117 33.0 33 2.7227
0.0115 34.0 34 2.7263
0.0113 35.0 35 2.7301
0.0111 36.0 36 2.7340
0.0108 37.0 37 2.7379
0.0106 38.0 38 2.7418
0.0104 39.0 39 2.7457
0.0104 40.0 40 2.7494
0.01 41.0 41 2.7532
0.0098 42.0 42 2.7569
0.0096 43.0 43 2.7606
0.0095 44.0 44 2.7643
0.0094 45.0 45 2.7681
0.0093 46.0 46 2.7720
0.0093 47.0 47 2.7760
0.0092 48.0 48 2.7802
0.0092 49.0 49 2.7846
0.0091 50.0 50 2.7892
0.0091 51.0 51 2.7940
0.0091 52.0 52 2.7989
0.0091 53.0 53 2.8039
0.009 54.0 54 2.8090
0.009 55.0 55 2.8141
0.0089 56.0 56 2.8191
0.0089 57.0 57 2.8239
0.0088 58.0 58 2.8284
0.0087 59.0 59 2.8331
0.0088 60.0 60 2.8372
0.0087 61.0 61 2.8405
0.0087 62.0 62 2.8433
0.0086 63.0 63 2.8457
0.0086 64.0 64 2.8476
0.0085 65.0 65 2.8499
0.0085 66.0 66 2.8514
0.0085 67.0 67 2.8530
0.0084 68.0 68 2.8545
0.0084 69.0 69 2.8560
0.0084 70.0 70 2.8575
0.0084 71.0 71 2.8590
0.0083 72.0 72 2.8605
0.0083 73.0 73 2.8620
0.0083 74.0 74 2.8633
0.0082 75.0 75 2.8646
0.0082 76.0 76 2.8657
0.0082 77.0 77 2.8668
0.0082 78.0 78 2.8679
0.0081 79.0 79 2.8689
0.0082 80.0 80 2.8697
0.0082 81.0 81 2.8705
0.0081 82.0 82 2.8711
0.0082 83.0 83 2.8716
0.0082 84.0 84 2.8719
0.0081 85.0 85 2.8721
0.0081 86.0 86 2.8723
0.0081 87.0 87 2.8724
0.0081 88.0 88 2.8724
0.0081 89.0 89 2.8725
0.0081 90.0 90 2.8725

Framework versions

  • PEFT 0.4.0
  • Transformers 4.37.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.0
  • Tokenizers 0.15.2
Downloads last month
2
Unable to determine this model’s pipeline type. Check the docs .

Adapter for