Edit model card

phi3mini_128k_i_RE_QA_alpha16_r_16

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4759

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
2.8436 0.0297 100 2.6247
1.791 0.0594 200 0.7624
0.6467 0.0890 300 0.5299
0.5517 0.1187 400 0.4906
0.5259 0.1484 500 0.4746
0.5142 0.1781 600 0.4653
0.5066 0.2077 700 0.4571
0.4989 0.2374 800 0.4520
0.4963 0.2671 900 0.4492
0.487 0.2968 1000 0.4463
0.4869 0.3265 1100 0.4440
0.4854 0.3561 1200 0.4424
0.4808 0.3858 1300 0.4407
0.4774 0.4155 1400 0.4394
0.4754 0.4452 1500 0.4378
0.4705 0.4748 1600 0.4376
0.471 0.5045 1700 0.4377
0.4663 0.5342 1800 0.4362
0.4644 0.5639 1900 0.4355
0.4624 0.5936 2000 0.4368
0.4623 0.6232 2100 0.4355
0.4514 0.6529 2200 0.4353
0.4542 0.6826 2300 0.4356
0.4535 0.7123 2400 0.4358
0.4499 0.7419 2500 0.4377
0.4499 0.7716 2600 0.4400
0.448 0.8013 2700 0.4353
0.4427 0.8310 2800 0.4389
0.4399 0.8607 2900 0.4416
0.4463 0.8903 3000 0.4404
0.4399 0.9200 3100 0.4415
0.432 0.9497 3200 0.4446
0.4336 0.9794 3300 0.4412
0.4299 1.0091 3400 0.4458
0.4236 1.0387 3500 0.4410
0.4235 1.0684 3600 0.4461
0.4204 1.0981 3700 0.4487
0.4179 1.1278 3800 0.4482
0.4158 1.1574 3900 0.4469
0.4226 1.1871 4000 0.4469
0.4116 1.2168 4100 0.4474
0.4137 1.2465 4200 0.4484
0.41 1.2762 4300 0.4531
0.4116 1.3058 4400 0.4536
0.4086 1.3355 4500 0.4500
0.4093 1.3652 4600 0.4497
0.4065 1.3949 4700 0.4566
0.4084 1.4245 4800 0.4540
0.4017 1.4542 4900 0.4529
0.4062 1.4839 5000 0.4519
0.3943 1.5136 5100 0.4555
0.4016 1.5433 5200 0.4544
0.4012 1.5729 5300 0.4556
0.3953 1.6026 5400 0.4566
0.4039 1.6323 5500 0.4586
0.3925 1.6620 5600 0.4556
0.3931 1.6916 5700 0.4581
0.3925 1.7213 5800 0.4608
0.3945 1.7510 5900 0.4561
0.3946 1.7807 6000 0.4569
0.3843 1.8104 6100 0.4629
0.3981 1.8400 6200 0.4640
0.3864 1.8697 6300 0.4640
0.3875 1.8994 6400 0.4616
0.39 1.9291 6500 0.4636
0.3871 1.9587 6600 0.4630
0.3887 1.9884 6700 0.4643
0.3853 2.0181 6800 0.4648
0.3826 2.0478 6900 0.4679
0.372 2.0775 7000 0.4688
0.3776 2.1071 7100 0.4661
0.3725 2.1368 7200 0.4685
0.3789 2.1665 7300 0.4693
0.3753 2.1962 7400 0.4707
0.3811 2.2258 7500 0.4708
0.3741 2.2555 7600 0.4694
0.3775 2.2852 7700 0.4683
0.3742 2.3149 7800 0.4729
0.3747 2.3446 7900 0.4728
0.3707 2.3742 8000 0.4698
0.3685 2.4039 8100 0.4739
0.3796 2.4336 8200 0.4720
0.3605 2.4633 8300 0.4750
0.3711 2.4930 8400 0.4729
0.3673 2.5226 8500 0.4743
0.3715 2.5523 8600 0.4742
0.3693 2.5820 8700 0.4735
0.367 2.6117 8800 0.4748
0.3731 2.6413 8900 0.4736
0.3685 2.6710 9000 0.4744
0.3716 2.7007 9100 0.4755
0.3698 2.7304 9200 0.4743
0.3617 2.7601 9300 0.4748
0.3666 2.7897 9400 0.4745
0.3642 2.8194 9500 0.4756
0.3654 2.8491 9600 0.4756
0.3672 2.8788 9700 0.4752
0.3621 2.9084 9800 0.4751
0.3639 2.9381 9900 0.4764
0.3664 2.9678 10000 0.4757
0.365 2.9975 10100 0.4759

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.2.1
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for