Edit model card

MPT_1000_STEPS_1e8_rate_01_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6939
  • Rewards/chosen: -0.0008
  • Rewards/rejected: 0.0005
  • Rewards/accuracies: 0.4747
  • Rewards/margins: -0.0013
  • Logps/rejected: -21.5525
  • Logps/chosen: -20.8004
  • Logits/rejected: 14.2517
  • Logits/chosen: 14.2543

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6939 0.05 50 0.6936 -0.0008 -0.0000 0.4791 -0.0007 -21.5578 -20.7998 14.2549 14.2575
0.6946 0.1 100 0.6930 -0.0001 -0.0005 0.4923 0.0004 -21.5626 -20.7929 14.2554 14.2579
0.6931 0.15 150 0.6935 -0.0014 -0.0009 0.4967 -0.0005 -21.5666 -20.8066 14.2572 14.2598
0.6917 0.2 200 0.6929 0.0003 -0.0004 0.4813 0.0006 -21.5611 -20.7895 14.2562 14.2588
0.6954 0.24 250 0.6940 -0.0021 -0.0006 0.4857 -0.0015 -21.5632 -20.8129 14.2623 14.2649
0.6932 0.29 300 0.6931 -0.0008 -0.0009 0.4967 0.0001 -21.5667 -20.8001 14.2610 14.2636
0.6954 0.34 350 0.6934 -0.0012 -0.0009 0.5011 -0.0003 -21.5662 -20.8041 14.2641 14.2667
0.6891 0.39 400 0.6945 -0.0025 0.0001 0.4725 -0.0026 -21.5566 -20.8174 14.2546 14.2572
0.6917 0.44 450 0.6935 -0.0011 -0.0005 0.4593 -0.0006 -21.5622 -20.8030 14.2562 14.2588
0.6908 0.49 500 0.6936 -0.0018 -0.0010 0.4813 -0.0007 -21.5679 -20.8101 14.2507 14.2533
0.6927 0.54 550 0.6941 -0.0016 0.0001 0.4681 -0.0017 -21.5560 -20.8079 14.2549 14.2575
0.6923 0.59 600 0.6939 -0.0011 0.0003 0.4527 -0.0014 -21.5542 -20.8035 14.2539 14.2565
0.6946 0.64 650 0.6944 -0.0013 0.0012 0.4593 -0.0024 -21.5459 -20.8048 14.2527 14.2553
0.6918 0.68 700 0.6934 -0.0002 0.0002 0.4747 -0.0004 -21.5558 -20.7942 14.2531 14.2557
0.6923 0.73 750 0.6939 -0.0011 0.0002 0.4967 -0.0013 -21.5551 -20.8028 14.2522 14.2547
0.6895 0.78 800 0.6937 -0.0006 0.0004 0.4945 -0.0010 -21.5532 -20.7977 14.2513 14.2539
0.6936 0.83 850 0.6937 -0.0007 0.0003 0.4945 -0.0010 -21.5541 -20.7990 14.2516 14.2542
0.6904 0.88 900 0.6939 -0.0008 0.0005 0.4747 -0.0013 -21.5525 -20.8004 14.2517 14.2543
0.6926 0.93 950 0.6939 -0.0008 0.0005 0.4747 -0.0013 -21.5525 -20.8004 14.2517 14.2543
0.691 0.98 1000 0.6939 -0.0008 0.0005 0.4747 -0.0013 -21.5525 -20.8004 14.2517 14.2543

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
1
Safetensors
Model size
6.65B params
Tensor type
FP16
·

Finetuned from