Edit model card

MPT_1000_STEPS_1e7_rate_03_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6924
  • Rewards/chosen: -0.0146
  • Rewards/rejected: -0.0175
  • Rewards/accuracies: 0.5275
  • Rewards/margins: 0.0029
  • Logps/rejected: -21.6159
  • Logps/chosen: -20.8410
  • Logits/rejected: 14.2241
  • Logits/chosen: 14.2267

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6908 0.05 50 0.6958 -0.0024 0.0016 0.4835 -0.0040 -21.5521 -20.8002 14.2618 14.2644
0.7007 0.1 100 0.6940 -0.0004 -0.0001 0.5033 -0.0003 -21.5577 -20.7936 14.2508 14.2534
0.6945 0.15 150 0.6935 -0.0010 -0.0016 0.4923 0.0006 -21.5629 -20.7956 14.2501 14.2527
0.6911 0.2 200 0.6947 0.0111 0.0130 0.5055 -0.0019 -21.5142 -20.7552 14.2536 14.2561
0.6944 0.24 250 0.6926 -0.0007 -0.0032 0.5297 0.0025 -21.5681 -20.7945 14.2489 14.2515
0.6893 0.29 300 0.6925 -0.0029 -0.0056 0.5143 0.0027 -21.5761 -20.8017 14.2454 14.2480
0.6964 0.34 350 0.6933 -0.0031 -0.0043 0.4901 0.0012 -21.5718 -20.8026 14.2500 14.2526
0.6846 0.39 400 0.6899 -0.0142 -0.0220 0.5516 0.0078 -21.6306 -20.8394 14.2259 14.2284
0.6823 0.44 450 0.6910 -0.0143 -0.0200 0.5143 0.0056 -21.6240 -20.8400 14.2294 14.2320
0.6838 0.49 500 0.6908 -0.0099 -0.0159 0.5297 0.0059 -21.6103 -20.8253 14.2237 14.2263
0.678 0.54 550 0.6897 -0.0151 -0.0234 0.5407 0.0082 -21.6354 -20.8427 14.2251 14.2277
0.6872 0.59 600 0.6915 -0.0176 -0.0223 0.5385 0.0047 -21.6318 -20.8508 14.2284 14.2311
0.6881 0.64 650 0.6906 -0.0132 -0.0196 0.5319 0.0064 -21.6228 -20.8362 14.2236 14.2262
0.6841 0.68 700 0.6910 -0.0146 -0.0202 0.5143 0.0057 -21.6249 -20.8408 14.2152 14.2178
0.6883 0.73 750 0.6901 -0.0148 -0.0223 0.5626 0.0075 -21.6317 -20.8414 14.2218 14.2244
0.6813 0.78 800 0.6917 -0.0150 -0.0192 0.5341 0.0041 -21.6213 -20.8422 14.2255 14.2281
0.6987 0.83 850 0.6902 -0.0129 -0.0204 0.5297 0.0075 -21.6253 -20.8350 14.2198 14.2223
0.687 0.88 900 0.6928 -0.0126 -0.0148 0.5121 0.0021 -21.6067 -20.8343 14.2248 14.2275
0.6885 0.93 950 0.6924 -0.0146 -0.0175 0.5275 0.0029 -21.6159 -20.8410 14.2241 14.2267
0.6904 0.98 1000 0.6924 -0.0146 -0.0175 0.5275 0.0029 -21.6159 -20.8410 14.2241 14.2267

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
2
Safetensors
Model size
6.65B params
Tensor type
FP16
·

Finetuned from