KaranChand's picture
Model save
8591e44 verified
|
raw
history blame
5.92 kB
metadata
license: mit
library_name: peft
tags:
  - trl
  - sft
  - generated_from_trainer
datasets:
  - generator
base_model: microsoft/Phi-3-mini-4k-instruct
model-index:
  - name: phi-ft-1000000-fp-newsplit
    results: []

phi-ft-1000000-fp-newsplit

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7754

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 0
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
3.1002 0.0114 100 3.0505
2.1929 0.0229 200 2.0493
1.6369 0.0343 300 1.6432
1.4618 0.0458 400 1.5580
1.317 0.0572 500 1.5410
1.1329 0.0687 600 1.6269
0.9505 0.0801 700 1.7387
0.8334 0.0916 800 1.7443
0.7692 0.1030 900 1.7634
0.6983 0.1145 1000 1.7546
0.6859 0.1259 1100 1.7593
0.6671 0.1374 1200 1.7647
0.6285 0.1488 1300 1.7951
0.6121 0.1603 1400 1.7816
0.5923 0.1717 1500 1.8132
0.5908 0.1832 1600 1.7664
0.5662 0.1946 1700 1.8307
0.5637 0.2060 1800 1.7864
0.5475 0.2175 1900 1.7988
0.5421 0.2289 2000 1.7876
0.529 0.2404 2100 1.7661
0.5202 0.2518 2200 1.7709
0.5287 0.2633 2300 1.7681
0.514 0.2747 2400 1.7765
0.5026 0.2862 2500 1.7931
0.5038 0.2976 2600 1.7808
0.5052 0.3091 2700 1.7689
0.4918 0.3205 2800 1.7862
0.4817 0.3320 2900 1.7916
0.4806 0.3434 3000 1.7796
0.4849 0.3549 3100 1.7654
0.4784 0.3663 3200 1.7576
0.4712 0.3777 3300 1.7746
0.4715 0.3892 3400 1.7568
0.4608 0.4006 3500 1.7424
0.4629 0.4121 3600 1.7561
0.4591 0.4235 3700 1.7498
0.4652 0.4350 3800 1.7366
0.461 0.4464 3900 1.7394
0.4469 0.4579 4000 1.7397
0.4521 0.4693 4100 1.7555
0.4498 0.4808 4200 1.7652
0.4541 0.4922 4300 1.7583
0.4594 0.5037 4400 1.7605
0.4514 0.5151 4500 1.7686
0.4395 0.5266 4600 1.7714
0.4384 0.5380 4700 1.7889
0.4392 0.5495 4800 1.7709
0.4495 0.5609 4900 1.7554
0.4375 0.5723 5000 1.7532
0.4441 0.5838 5100 1.7770
0.4458 0.5952 5200 1.7528
0.4343 0.6067 5300 1.7646
0.433 0.6181 5400 1.7689
0.4371 0.6296 5500 1.7738
0.4376 0.6410 5600 1.7633
0.4366 0.6525 5700 1.7810
0.43 0.6639 5800 1.7685
0.4345 0.6754 5900 1.7761
0.4379 0.6868 6000 1.7782
0.4294 0.6983 6100 1.7737
0.4441 0.7097 6200 1.7646
0.4396 0.7212 6300 1.7779
0.4307 0.7326 6400 1.7766
0.4331 0.7440 6500 1.7733
0.4326 0.7555 6600 1.7796
0.4286 0.7669 6700 1.7803
0.4294 0.7784 6800 1.7787
0.4294 0.7898 6900 1.7795
0.4364 0.8013 7000 1.7765
0.4414 0.8127 7100 1.7783
0.4336 0.8242 7200 1.7746
0.4324 0.8356 7300 1.7728
0.4414 0.8471 7400 1.7765
0.4288 0.8585 7500 1.7792
0.4359 0.8700 7600 1.7776
0.4242 0.8814 7700 1.7762
0.4413 0.8929 7800 1.7751
0.4402 0.9043 7900 1.7754
0.4452 0.9158 8000 1.7750
0.4346 0.9272 8100 1.7755
0.4396 0.9386 8200 1.7751
0.44 0.9501 8300 1.7752
0.4333 0.9615 8400 1.7753
0.4348 0.9730 8500 1.7754
0.4331 0.9844 8600 1.7752
0.4326 0.9959 8700 1.7754

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.16.0
  • Tokenizers 0.19.1