V0414H4
This model is a fine-tuned version of microsoft/phi-2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0509
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.003
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 100
- num_epochs: 4
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.2233 | 0.05 | 10 | 1.3485 |
0.7462 | 0.09 | 20 | 0.1747 |
0.1543 | 0.14 | 30 | 0.1288 |
0.1245 | 0.18 | 40 | 0.1069 |
0.1035 | 0.23 | 50 | 0.0897 |
0.0977 | 0.27 | 60 | 0.0858 |
0.0917 | 0.32 | 70 | 0.0843 |
0.1014 | 0.36 | 80 | 0.0814 |
0.0817 | 0.41 | 90 | 0.0802 |
0.0879 | 0.45 | 100 | 0.0743 |
0.0858 | 0.5 | 110 | 0.0767 |
0.088 | 0.54 | 120 | 0.0740 |
0.0846 | 0.59 | 130 | 0.0735 |
0.0891 | 0.63 | 140 | 0.0766 |
0.0862 | 0.68 | 150 | 0.0794 |
0.0813 | 0.73 | 160 | 0.0842 |
0.0859 | 0.77 | 170 | 0.0706 |
0.0806 | 0.82 | 180 | 0.0753 |
0.092 | 0.86 | 190 | 0.0716 |
0.0727 | 0.91 | 200 | 0.0709 |
0.1142 | 0.95 | 210 | 0.0756 |
0.0861 | 1.0 | 220 | 0.0733 |
0.0673 | 1.04 | 230 | 0.0680 |
0.0599 | 1.09 | 240 | 0.0643 |
0.1244 | 1.13 | 250 | 0.0694 |
0.0724 | 1.18 | 260 | 0.0726 |
0.0712 | 1.22 | 270 | 0.0596 |
0.0544 | 1.27 | 280 | 0.0706 |
0.081 | 1.31 | 290 | 0.0648 |
0.0931 | 1.36 | 300 | 0.0632 |
0.0736 | 1.41 | 310 | 0.0566 |
0.0631 | 1.45 | 320 | 0.0566 |
0.7605 | 1.5 | 330 | 0.7501 |
0.1829 | 1.54 | 340 | 0.0805 |
0.0928 | 1.59 | 350 | 0.0756 |
0.4824 | 1.63 | 360 | 0.1228 |
0.0929 | 1.68 | 370 | 0.0644 |
0.0735 | 1.72 | 380 | 0.0858 |
0.0856 | 1.77 | 390 | 0.0622 |
0.0689 | 1.81 | 400 | 0.0668 |
0.0674 | 1.86 | 410 | 0.0658 |
0.0694 | 1.9 | 420 | 0.0648 |
0.0729 | 1.95 | 430 | 0.0670 |
0.0756 | 1.99 | 440 | 0.0759 |
0.0695 | 2.04 | 450 | 0.0648 |
0.0617 | 2.08 | 460 | 0.0557 |
0.0617 | 2.13 | 470 | 0.0591 |
0.0588 | 2.18 | 480 | 0.0604 |
0.0549 | 2.22 | 490 | 0.0582 |
0.0494 | 2.27 | 500 | 0.0672 |
0.0675 | 2.31 | 510 | 0.0673 |
0.1043 | 2.36 | 520 | 0.0938 |
0.0762 | 2.4 | 530 | 0.0614 |
0.0661 | 2.45 | 540 | 0.0593 |
0.0619 | 2.49 | 550 | 0.0561 |
0.0607 | 2.54 | 560 | 0.0531 |
0.0522 | 2.58 | 570 | 0.0538 |
0.0728 | 2.63 | 580 | 0.0539 |
0.0472 | 2.67 | 590 | 0.0540 |
0.0522 | 2.72 | 600 | 0.0519 |
0.0507 | 2.76 | 610 | 0.0479 |
0.0518 | 2.81 | 620 | 0.0488 |
0.0487 | 2.86 | 630 | 0.0498 |
0.0505 | 2.9 | 640 | 0.0532 |
0.0445 | 2.95 | 650 | 0.0508 |
0.0455 | 2.99 | 660 | 0.0525 |
0.0459 | 3.04 | 670 | 0.0529 |
0.04 | 3.08 | 680 | 0.0527 |
0.035 | 3.13 | 690 | 0.0524 |
0.0556 | 3.17 | 700 | 0.0516 |
0.0354 | 3.22 | 710 | 0.0513 |
0.038 | 3.26 | 720 | 0.0508 |
0.0348 | 3.31 | 730 | 0.0530 |
0.0358 | 3.35 | 740 | 0.0538 |
0.0434 | 3.4 | 750 | 0.0542 |
0.0443 | 3.44 | 760 | 0.0520 |
0.0417 | 3.49 | 770 | 0.0509 |
0.0437 | 3.54 | 780 | 0.0502 |
0.0384 | 3.58 | 790 | 0.0510 |
0.0388 | 3.63 | 800 | 0.0510 |
0.0341 | 3.67 | 810 | 0.0506 |
0.0397 | 3.72 | 820 | 0.0509 |
0.0353 | 3.76 | 830 | 0.0507 |
0.0364 | 3.81 | 840 | 0.0508 |
0.0381 | 3.85 | 850 | 0.0508 |
0.0268 | 3.9 | 860 | 0.0509 |
0.0364 | 3.94 | 870 | 0.0509 |
0.044 | 3.99 | 880 | 0.0509 |
Framework versions
- Transformers 4.36.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1
Model tree for Litzy619/V0414H4
Base model
microsoft/phi-2