phi3mini_128k_i_RE_QA_alpha16_r_16
This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4759
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.8436 | 0.0297 | 100 | 2.6247 |
1.791 | 0.0594 | 200 | 0.7624 |
0.6467 | 0.0890 | 300 | 0.5299 |
0.5517 | 0.1187 | 400 | 0.4906 |
0.5259 | 0.1484 | 500 | 0.4746 |
0.5142 | 0.1781 | 600 | 0.4653 |
0.5066 | 0.2077 | 700 | 0.4571 |
0.4989 | 0.2374 | 800 | 0.4520 |
0.4963 | 0.2671 | 900 | 0.4492 |
0.487 | 0.2968 | 1000 | 0.4463 |
0.4869 | 0.3265 | 1100 | 0.4440 |
0.4854 | 0.3561 | 1200 | 0.4424 |
0.4808 | 0.3858 | 1300 | 0.4407 |
0.4774 | 0.4155 | 1400 | 0.4394 |
0.4754 | 0.4452 | 1500 | 0.4378 |
0.4705 | 0.4748 | 1600 | 0.4376 |
0.471 | 0.5045 | 1700 | 0.4377 |
0.4663 | 0.5342 | 1800 | 0.4362 |
0.4644 | 0.5639 | 1900 | 0.4355 |
0.4624 | 0.5936 | 2000 | 0.4368 |
0.4623 | 0.6232 | 2100 | 0.4355 |
0.4514 | 0.6529 | 2200 | 0.4353 |
0.4542 | 0.6826 | 2300 | 0.4356 |
0.4535 | 0.7123 | 2400 | 0.4358 |
0.4499 | 0.7419 | 2500 | 0.4377 |
0.4499 | 0.7716 | 2600 | 0.4400 |
0.448 | 0.8013 | 2700 | 0.4353 |
0.4427 | 0.8310 | 2800 | 0.4389 |
0.4399 | 0.8607 | 2900 | 0.4416 |
0.4463 | 0.8903 | 3000 | 0.4404 |
0.4399 | 0.9200 | 3100 | 0.4415 |
0.432 | 0.9497 | 3200 | 0.4446 |
0.4336 | 0.9794 | 3300 | 0.4412 |
0.4299 | 1.0091 | 3400 | 0.4458 |
0.4236 | 1.0387 | 3500 | 0.4410 |
0.4235 | 1.0684 | 3600 | 0.4461 |
0.4204 | 1.0981 | 3700 | 0.4487 |
0.4179 | 1.1278 | 3800 | 0.4482 |
0.4158 | 1.1574 | 3900 | 0.4469 |
0.4226 | 1.1871 | 4000 | 0.4469 |
0.4116 | 1.2168 | 4100 | 0.4474 |
0.4137 | 1.2465 | 4200 | 0.4484 |
0.41 | 1.2762 | 4300 | 0.4531 |
0.4116 | 1.3058 | 4400 | 0.4536 |
0.4086 | 1.3355 | 4500 | 0.4500 |
0.4093 | 1.3652 | 4600 | 0.4497 |
0.4065 | 1.3949 | 4700 | 0.4566 |
0.4084 | 1.4245 | 4800 | 0.4540 |
0.4017 | 1.4542 | 4900 | 0.4529 |
0.4062 | 1.4839 | 5000 | 0.4519 |
0.3943 | 1.5136 | 5100 | 0.4555 |
0.4016 | 1.5433 | 5200 | 0.4544 |
0.4012 | 1.5729 | 5300 | 0.4556 |
0.3953 | 1.6026 | 5400 | 0.4566 |
0.4039 | 1.6323 | 5500 | 0.4586 |
0.3925 | 1.6620 | 5600 | 0.4556 |
0.3931 | 1.6916 | 5700 | 0.4581 |
0.3925 | 1.7213 | 5800 | 0.4608 |
0.3945 | 1.7510 | 5900 | 0.4561 |
0.3946 | 1.7807 | 6000 | 0.4569 |
0.3843 | 1.8104 | 6100 | 0.4629 |
0.3981 | 1.8400 | 6200 | 0.4640 |
0.3864 | 1.8697 | 6300 | 0.4640 |
0.3875 | 1.8994 | 6400 | 0.4616 |
0.39 | 1.9291 | 6500 | 0.4636 |
0.3871 | 1.9587 | 6600 | 0.4630 |
0.3887 | 1.9884 | 6700 | 0.4643 |
0.3853 | 2.0181 | 6800 | 0.4648 |
0.3826 | 2.0478 | 6900 | 0.4679 |
0.372 | 2.0775 | 7000 | 0.4688 |
0.3776 | 2.1071 | 7100 | 0.4661 |
0.3725 | 2.1368 | 7200 | 0.4685 |
0.3789 | 2.1665 | 7300 | 0.4693 |
0.3753 | 2.1962 | 7400 | 0.4707 |
0.3811 | 2.2258 | 7500 | 0.4708 |
0.3741 | 2.2555 | 7600 | 0.4694 |
0.3775 | 2.2852 | 7700 | 0.4683 |
0.3742 | 2.3149 | 7800 | 0.4729 |
0.3747 | 2.3446 | 7900 | 0.4728 |
0.3707 | 2.3742 | 8000 | 0.4698 |
0.3685 | 2.4039 | 8100 | 0.4739 |
0.3796 | 2.4336 | 8200 | 0.4720 |
0.3605 | 2.4633 | 8300 | 0.4750 |
0.3711 | 2.4930 | 8400 | 0.4729 |
0.3673 | 2.5226 | 8500 | 0.4743 |
0.3715 | 2.5523 | 8600 | 0.4742 |
0.3693 | 2.5820 | 8700 | 0.4735 |
0.367 | 2.6117 | 8800 | 0.4748 |
0.3731 | 2.6413 | 8900 | 0.4736 |
0.3685 | 2.6710 | 9000 | 0.4744 |
0.3716 | 2.7007 | 9100 | 0.4755 |
0.3698 | 2.7304 | 9200 | 0.4743 |
0.3617 | 2.7601 | 9300 | 0.4748 |
0.3666 | 2.7897 | 9400 | 0.4745 |
0.3642 | 2.8194 | 9500 | 0.4756 |
0.3654 | 2.8491 | 9600 | 0.4756 |
0.3672 | 2.8788 | 9700 | 0.4752 |
0.3621 | 2.9084 | 9800 | 0.4751 |
0.3639 | 2.9381 | 9900 | 0.4764 |
0.3664 | 2.9678 | 10000 | 0.4757 |
0.365 | 2.9975 | 10100 | 0.4759 |
Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.2.1
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 0
Unable to determine this model’s pipeline type. Check the
docs
.