Visualize in Weights & Biases

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Logits/chosen: -2.2950
  • Logits/rejected: -2.1831
  • Logps/chosen: -268.8994
  • Logps/rejected: -246.9545
  • Loss: 1.3753
  • Rewards/accuracies: 0.6840
  • Rewards/chosen: 0.1114
  • Rewards/margins: 0.4929
  • Rewards/rejected: -0.3815

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
1.3628 0.0523 100 -2.3171 -2.2076 -268.5694 -245.9993 1.3708 0.6820 0.2269 0.2741 -0.0472
1.3948 0.1047 200 -2.3041 -2.1937 -268.7622 -246.5198 1.3925 0.6700 0.1594 0.3888 -0.2294
1.4105 0.1570 300 -2.3326 -2.2230 -269.4514 -247.3755 1.4104 0.6820 -0.0818 0.4471 -0.5289
1.4014 0.2094 400 -2.3264 -2.2167 -268.8318 -246.7196 1.4024 0.6760 0.1350 0.4344 -0.2993
1.4041 0.2617 500 -2.3064 -2.1950 -268.4164 -246.5134 1.4132 0.6800 0.2804 0.5076 -0.2271
1.419 0.3141 600 -2.3018 -2.1895 -269.1514 -246.9937 1.4088 0.6500 0.0232 0.4184 -0.3953
1.4382 0.3664 700 -2.2848 -2.1715 -269.7142 -247.6436 1.4137 0.6660 -0.1738 0.4489 -0.6227
1.4029 0.4187 800 -2.3170 -2.2078 -269.3091 -247.1983 1.4086 0.6640 -0.0320 0.4349 -0.4669
1.4076 0.4711 900 -2.2777 -2.1613 -269.2120 -247.1355 1.4028 0.6640 0.0020 0.4468 -0.4449
1.3823 0.5234 1000 -2.2891 -2.1756 -268.8081 -246.8032 1.3954 0.6520 0.1433 0.4719 -0.3286
1.3713 0.5758 1100 -2.2961 -2.1837 -269.3844 -247.4280 1.3982 0.6600 -0.0584 0.4889 -0.5473
1.3592 0.6281 1200 -2.2972 -2.1859 -269.0363 -247.0839 1.3881 0.6720 0.0634 0.4903 -0.4268
1.3859 0.6805 1300 -2.2892 -2.1763 -268.6349 -246.6918 1.3878 0.6780 0.2040 0.4936 -0.2896
1.3505 0.7328 1400 -2.2898 -2.1769 -268.8507 -247.0505 1.3823 0.6940 0.1284 0.5436 -0.4152
1.3499 0.7851 1500 -2.2921 -2.1798 -269.0495 -247.1410 1.3815 0.6920 0.0588 0.5056 -0.4468
1.3745 0.8375 1600 -2.2933 -2.1808 -268.8829 -246.9300 1.3764 0.7080 0.1172 0.4901 -0.3730
1.3744 0.8898 1700 -2.2950 -2.1831 -268.9738 -246.9943 1.3749 0.6760 0.0853 0.4808 -0.3955
1.3576 0.9422 1800 -2.2944 -2.1825 -268.9084 -246.9460 1.3785 0.6920 0.1082 0.4868 -0.3786
1.3778 0.9945 1900 -2.2950 -2.1831 -268.8994 -246.9545 1.3753 0.6840 0.1114 0.4929 -0.3815

Framework versions

  • PEFT 0.10.0
  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Kimory-X/zephyr-7b-dpo-qlora

Adapter
(1802)
this model