Edit model card

tinymistral-248-DPO

This model is a fine-tuned version of Locutusque/TinyMistral-248M on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3205
  • Accuracy: 0.0
  • Rewards/chosen: 0.7722
  • Rewards/rejected: -0.2727
  • Rewards/accuracies: 1.0
  • Rewards/margins: 1.0449
  • Logps/rejected: -286.5494
  • Logps/chosen: -398.5646
  • Logits/rejected: -2.3562
  • Logits/chosen: -1.8620

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 12
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 6
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5815 0.48 10 0.3205 0.0 0.7722 -0.2727 1.0 1.0449 -286.5494 -398.5646 -2.3562 -1.8620
0.3287 0.95 20 0.0970 0.0 1.0191 -1.8694 1.0 2.8886 -302.5168 -396.0956 -2.0547 -1.5790
0.2126 1.43 30 0.0414 0.0 0.3685 -4.5314 1.0 4.8999 -329.1370 -402.6024 -1.8100 -1.4099
0.1844 1.9 40 0.0260 0.0 0.9879 -4.8275 1.0 5.8153 -332.0973 -396.4084 -1.8704 -1.4976
0.1546 2.38 50 0.0190 0.0 1.1813 -5.2560 1.0 6.4373 -336.3821 -394.4740 -1.9098 -1.5582
0.1532 2.86 60 0.0140 0.0 1.0583 -6.0198 1.0 7.0780 -344.0201 -395.7045 -1.8920 -1.5654
0.1402 3.33 70 0.0112 0.0 1.0134 -6.5382 1.0 7.5517 -349.2049 -396.1526 -1.8823 -1.5706
0.1544 3.81 80 0.0089 0.0 0.8836 -7.1726 1.0 8.0562 -355.5490 -397.4513 -1.8518 -1.5535
0.1357 4.29 90 0.0072 0.0 0.7532 -7.7663 1.0 8.5195 -361.4852 -398.7546 -1.8193 -1.5345
0.1418 4.76 100 0.0061 0.0 0.6041 -8.3133 1.0 8.9174 -366.9556 -400.2459 -1.7889 -1.5150
0.1482 5.24 110 0.0051 0.0 0.4867 -8.7961 1.0 9.2828 -371.7837 -401.4203 -1.7611 -1.4971
0.141 5.71 120 0.0045 0.0 0.4212 -9.1494 1.0 9.5706 -375.3166 -402.0751 -1.7409 -1.4842

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
0

Adapter for