checkpoint_dir

This model is a fine-tuned version of microsoft/Phi-3.5-mini-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5253

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 0
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 8

Training results

Training Loss Epoch Step Validation Loss
1.7134 0.0405 3 1.8221
1.7619 0.0811 6 1.6961
1.5449 0.1216 9 1.5453
1.3625 0.1622 12 1.3982
1.1389 0.2027 15 1.2786
1.1091 0.2432 18 1.1889
1.0605 0.2838 21 1.1050
0.9908 0.3243 24 1.0395
0.9653 0.3649 27 0.9886
0.9258 0.4054 30 0.9401
0.8964 0.4459 33 0.8945
0.8189 0.4865 36 0.8615
0.7202 0.5270 39 0.8325
0.7553 0.5676 42 0.8109
0.7415 0.6081 45 0.7911
0.6421 0.6486 48 0.7730
0.7638 0.6892 51 0.7411
0.7495 0.7297 54 0.7208
0.7678 0.7703 57 0.7102
0.7027 0.8108 60 0.7002
0.7106 0.8514 63 0.6892
0.8461 0.8919 66 0.6852
0.5863 0.9324 69 0.6826
0.7466 0.9730 72 0.6802
0.5847 1.0135 75 0.6696
0.5349 1.0541 78 0.6590
0.5991 1.0946 81 0.6560
0.5777 1.1351 84 0.6526
0.6342 1.1757 87 0.6488
0.5053 1.2162 90 0.6494
0.4909 1.2568 93 0.6485
0.5154 1.2973 96 0.6458
0.4728 1.3378 99 0.6375
0.5648 1.3784 102 0.6327
0.4878 1.4189 105 0.6260
0.5677 1.4595 108 0.6165
0.6598 1.5 111 0.6059
0.5811 1.5405 114 0.6021
0.5984 1.5811 117 0.6018
0.4477 1.6216 120 0.6010
0.5762 1.6622 123 0.5944
0.7896 1.7027 126 0.5924
0.449 1.7432 129 0.5849
0.6014 1.7838 132 0.5793
0.4798 1.8243 135 0.5744
0.4943 1.8649 138 0.5715
0.3907 1.9054 141 0.5692
0.6352 1.9459 144 0.5631
0.469 1.9865 147 0.5633
0.4819 2.0270 150 0.5623
0.7567 2.0676 153 0.5610
0.533 2.1081 156 0.5641
0.4195 2.1486 159 0.5615
0.4015 2.1892 162 0.5609
0.2958 2.2297 165 0.5642
0.4477 2.2703 168 0.5602
0.4111 2.3108 171 0.5530
0.3958 2.3514 174 0.5495
0.3053 2.3919 177 0.5437
0.4952 2.4324 180 0.5400
0.5617 2.4730 183 0.5322
0.298 2.5135 186 0.5273
0.5439 2.5541 189 0.5256
0.5791 2.5946 192 0.5215
0.4429 2.6351 195 0.5205
0.4454 2.6757 198 0.5251
0.4071 2.7162 201 0.5267
0.3948 2.7568 204 0.5327
0.3196 2.7973 207 0.5342
0.3567 2.8378 210 0.5344
0.5284 2.8784 213 0.5292
0.491 2.9189 216 0.5182
0.4267 2.9595 219 0.5137
0.3587 3.0 222 0.5098
0.3587 3.0405 225 0.5131
0.377 3.0811 228 0.5200
0.6423 3.1216 231 0.5214
0.4839 3.1622 234 0.5139
0.566 3.2027 237 0.5123
0.38 3.2432 240 0.5172
0.3995 3.2838 243 0.5207
0.3486 3.3243 246 0.5148
0.2418 3.3649 249 0.5104
0.3178 3.4054 252 0.5086
0.4065 3.4459 255 0.5031
0.3472 3.4865 258 0.5050
0.4543 3.5270 261 0.5046
0.4066 3.5676 264 0.5020
0.2606 3.6081 267 0.5010
0.2332 3.6486 270 0.5007
0.5026 3.6892 273 0.5003
0.3901 3.7297 276 0.5057
0.3552 3.7703 279 0.5126
0.3921 3.8108 282 0.5179
0.3366 3.8514 285 0.5092
0.3706 3.8919 288 0.5008
0.2791 3.9324 291 0.4961
0.2247 3.9730 294 0.4968
0.2879 4.0135 297 0.4971
0.3355 4.0541 300 0.5036
0.3928 4.0946 303 0.5023
0.2399 4.1351 306 0.5056
0.3396 4.1757 309 0.5089
0.2602 4.2162 312 0.5091
0.2565 4.2568 315 0.5110
0.24 4.2973 318 0.5156
0.2364 4.3378 321 0.5216
0.3694 4.3784 324 0.5224
0.2185 4.4189 327 0.5183
0.337 4.4595 330 0.5119
0.3404 4.5 333 0.5084
0.3049 4.5405 336 0.5071
0.4811 4.5811 339 0.5098
0.338 4.6216 342 0.5092
0.305 4.6622 345 0.5090
0.5273 4.7027 348 0.5079
0.3122 4.7432 351 0.5044
0.2995 4.7838 354 0.4991
0.2654 4.8243 357 0.4935
0.3992 4.8649 360 0.4946
0.2272 4.9054 363 0.5003
0.3094 4.9459 366 0.5026
0.2773 4.9865 369 0.5021
0.3934 5.0270 372 0.4993
0.271 5.0676 375 0.5015
0.3928 5.1081 378 0.5040
0.2105 5.1486 381 0.5134
0.2548 5.1892 384 0.5182
0.2424 5.2297 387 0.5104
0.4469 5.2703 390 0.5122
0.2866 5.3108 393 0.5112
0.2958 5.3514 396 0.5090
0.2034 5.3919 399 0.5051
0.4091 5.4324 402 0.5023
0.1415 5.4730 405 0.5059
0.4137 5.5135 408 0.5098
0.2784 5.5541 411 0.5134
0.158 5.5946 414 0.5160
0.4701 5.6351 417 0.5183
0.2256 5.6757 420 0.5168
0.1868 5.7162 423 0.5147
0.2868 5.7568 426 0.5130
0.2142 5.7973 429 0.5147
0.2693 5.8378 432 0.5130
0.2882 5.8784 435 0.5108
0.3243 5.9189 438 0.5098
0.343 5.9595 441 0.5067
0.2602 6.0 444 0.5002
0.2237 6.0405 447 0.5001
0.3727 6.0811 450 0.5039
0.2471 6.1216 453 0.5076
0.4095 6.1622 456 0.5145
0.2445 6.2027 459 0.5188
0.2387 6.2432 462 0.5231
0.2322 6.2838 465 0.5258
0.2998 6.3243 468 0.5270
0.2463 6.3649 471 0.5251
0.1931 6.4054 474 0.5237
0.2254 6.4459 477 0.5187
0.278 6.4865 480 0.5177
0.3654 6.5270 483 0.5162
0.2886 6.5676 486 0.5130
0.229 6.6081 489 0.5150
0.2361 6.6486 492 0.5158
0.1497 6.6892 495 0.5165
0.2926 6.7297 498 0.5179
0.2979 6.7703 501 0.5211
0.244 6.8108 504 0.5200
0.2846 6.8514 507 0.5197
0.1897 6.8919 510 0.5200
0.2106 6.9324 513 0.5210
0.3168 6.9730 516 0.5210
0.2002 7.0135 519 0.5192
0.3515 7.0541 522 0.5202
0.1807 7.0946 525 0.5214
0.2331 7.1351 528 0.5212
0.1571 7.1757 531 0.5215
0.186 7.2162 534 0.5194
0.2281 7.2568 537 0.5207
0.2534 7.2973 540 0.5219
0.3643 7.3378 543 0.5212
0.4516 7.3784 546 0.5203
0.181 7.4189 549 0.5226
0.256 7.4595 552 0.5214
0.2802 7.5 555 0.5212
0.1913 7.5405 558 0.5196
0.2293 7.5811 561 0.5207
0.2282 7.6216 564 0.5213
0.1954 7.6622 567 0.5225
0.3199 7.7027 570 0.5216
0.2687 7.7432 573 0.5231
0.2122 7.7838 576 0.5218
0.3616 7.8243 579 0.5228
0.1206 7.8649 582 0.5212
0.148 7.9054 585 0.5216
0.3779 7.9459 588 0.5224
0.272 7.9865 591 0.5253

Framework versions

  • PEFT 0.13.2
  • Transformers 4.45.2
  • Pytorch 2.4.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sujithatz/checkpoint_dir

Adapter
(288)
this model