Edit model card

covid3-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2375
  • Rewards/chosen: -2.8294
  • Rewards/rejected: -1.7077
  • Rewards/accuracies: 0.25
  • Rewards/margins: -1.1217
  • Logps/rejected: -24.0692
  • Logps/chosen: -35.7956
  • Logits/rejected: -2.8653
  • Logits/chosen: -2.8666

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 1000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6957 0.0 10 0.6940 0.0226 0.0252 0.375 -0.0026 -6.7409 -7.2761 -2.8058 -2.8067
0.6925 0.0 20 0.6971 0.0317 0.0422 0.3333 -0.0105 -6.5702 -7.1844 -2.8074 -2.8082
0.6876 0.01 30 0.6995 0.0202 0.0373 0.375 -0.0170 -6.6197 -7.2995 -2.8093 -2.8102
0.6961 0.01 40 0.6982 0.0054 0.0189 0.375 -0.0135 -6.8034 -7.4475 -2.8113 -2.8122
0.6863 0.01 50 0.6998 0.0019 0.0188 0.3333 -0.0169 -6.8044 -7.4830 -2.8121 -2.8130
0.6965 0.01 60 0.6977 0.0119 0.0251 0.2917 -0.0132 -6.7419 -7.3829 -2.8120 -2.8129
0.7209 0.01 70 0.6993 0.0336 0.0497 0.3333 -0.0161 -6.4949 -7.1656 -2.8103 -2.8112
0.6988 0.01 80 0.6984 0.0294 0.0432 0.375 -0.0138 -6.5605 -7.2080 -2.8085 -2.8094
0.6913 0.01 90 0.6981 0.0216 0.0342 0.4167 -0.0126 -6.6501 -7.2856 -2.8084 -2.8093
0.6641 0.02 100 0.7030 0.0493 0.0702 0.3333 -0.0209 -6.2907 -7.0088 -2.8098 -2.8107
0.7083 0.02 110 0.7072 0.0575 0.0870 0.3333 -0.0295 -6.1225 -6.9268 -2.8105 -2.8114
0.6307 0.02 120 0.7128 0.0727 0.1120 0.3333 -0.0393 -5.8727 -6.7749 -2.8105 -2.8114
0.7216 0.02 130 0.7158 0.0814 0.1250 0.3333 -0.0436 -5.7422 -6.6879 -2.8108 -2.8117
0.7189 0.02 140 0.7135 0.0948 0.1343 0.3333 -0.0395 -5.6489 -6.5536 -2.8099 -2.8108
0.7177 0.03 150 0.7128 0.0954 0.1335 0.3333 -0.0381 -5.6579 -6.5481 -2.8100 -2.8109
0.639 0.03 160 0.7232 0.0823 0.1404 0.3333 -0.0581 -5.5880 -6.6785 -2.8135 -2.8144
0.7128 0.03 170 0.7361 0.0571 0.1393 0.375 -0.0822 -5.5991 -6.9308 -2.8165 -2.8174
0.709 0.03 180 0.7361 0.0690 0.1519 0.375 -0.0829 -5.4739 -6.8120 -2.8159 -2.8168
0.6167 0.03 190 0.7483 0.0424 0.1461 0.375 -0.1038 -5.5311 -7.0782 -2.8180 -2.8189
0.7521 0.03 200 0.7589 0.0180 0.1360 0.3333 -0.1180 -5.6325 -7.3223 -2.8199 -2.8209
0.6204 0.04 210 0.7726 -0.0220 0.1130 0.375 -0.1350 -5.8622 -7.7214 -2.8217 -2.8227
0.6578 0.04 220 0.7839 -0.0525 0.0994 0.3333 -0.1520 -5.9980 -8.0273 -2.8232 -2.8242
0.7633 0.04 230 0.7868 -0.0613 0.0902 0.375 -0.1516 -6.0903 -8.1152 -2.8235 -2.8245
0.7391 0.04 240 0.7917 -0.0742 0.0850 0.375 -0.1592 -6.1429 -8.2441 -2.8246 -2.8256
0.6759 0.04 250 0.8023 -0.1101 0.0656 0.3333 -0.1757 -6.3368 -8.6031 -2.8262 -2.8272
0.6768 0.04 260 0.8107 -0.1470 0.0326 0.375 -0.1796 -6.6662 -8.9720 -2.8264 -2.8274
0.5398 0.04 270 0.8411 -0.2390 -0.0341 0.375 -0.2049 -7.3331 -9.8918 -2.8279 -2.8289
0.5617 0.05 280 0.8797 -0.3532 -0.1075 0.375 -0.2457 -8.0674 -11.0340 -2.8306 -2.8317
0.7585 0.05 290 0.9009 -0.4183 -0.1540 0.375 -0.2642 -8.5328 -11.6845 -2.8318 -2.8329
0.4971 0.05 300 0.9602 -0.5793 -0.2520 0.375 -0.3274 -9.5121 -13.2952 -2.8362 -2.8373
0.5759 0.05 310 1.0568 -0.8155 -0.3982 0.375 -0.4173 -10.9749 -15.6568 -2.8426 -2.8437
0.451 0.05 320 1.1605 -1.0527 -0.5383 0.375 -0.5144 -12.3754 -18.0287 -2.8482 -2.8493
1.4199 0.06 330 1.1756 -1.1393 -0.6287 0.375 -0.5106 -13.2791 -18.8948 -2.8505 -2.8516
0.6853 0.06 340 1.1875 -1.1840 -0.6936 0.375 -0.4904 -13.9281 -19.3416 -2.8530 -2.8541
0.3956 0.06 350 1.2550 -1.2944 -0.7654 0.375 -0.5291 -14.6460 -20.4463 -2.8568 -2.8579
0.8692 0.06 360 1.3093 -1.4107 -0.8644 0.375 -0.5463 -15.6363 -21.6084 -2.8602 -2.8613
1.4214 0.06 370 1.2759 -1.3853 -0.8782 0.375 -0.5071 -15.7746 -21.3549 -2.8579 -2.8590
0.6163 0.06 380 1.3124 -1.4537 -0.9274 0.375 -0.5263 -16.2665 -22.0389 -2.8580 -2.8591
0.586 0.07 390 1.4060 -1.6073 -1.0263 0.375 -0.5810 -17.2554 -23.5750 -2.8594 -2.8605
1.7565 0.07 400 1.3869 -1.5469 -0.9534 0.375 -0.5936 -16.5259 -22.9709 -2.8611 -2.8623
0.749 0.07 410 1.4037 -1.5658 -0.9400 0.375 -0.6258 -16.3927 -23.1602 -2.8615 -2.8626
0.7682 0.07 420 1.4444 -1.6154 -0.9575 0.375 -0.6578 -16.5678 -23.6556 -2.8618 -2.8630
0.5276 0.07 430 1.5646 -1.7833 -1.0365 0.375 -0.7467 -17.3576 -25.3345 -2.8645 -2.8658
1.2132 0.07 440 1.6229 -1.8510 -1.0641 0.375 -0.7869 -17.6336 -26.0119 -2.8657 -2.8670
1.0323 0.07 450 1.6468 -1.8672 -1.0528 0.3333 -0.8143 -17.5208 -26.1736 -2.8655 -2.8668
1.1453 0.08 460 1.6741 -1.8759 -1.0266 0.3333 -0.8494 -17.2580 -26.2613 -2.8659 -2.8672
1.526 0.08 470 1.6465 -1.8347 -1.0076 0.3333 -0.8271 -17.0681 -25.8488 -2.8671 -2.8684
1.1323 0.08 480 1.5543 -1.7064 -0.9557 0.3333 -0.7507 -16.5494 -24.5655 -2.8682 -2.8694
1.0389 0.08 490 1.5824 -1.7717 -1.0002 0.3333 -0.7715 -16.9945 -25.2190 -2.8694 -2.8706
0.8626 0.08 500 1.6038 -1.8376 -1.0545 0.3333 -0.7831 -17.5374 -25.8781 -2.8693 -2.8706
0.8392 0.09 510 1.6952 -1.9873 -1.1387 0.3333 -0.8486 -18.3790 -27.3744 -2.8697 -2.8710
0.6528 0.09 520 1.7895 -2.1144 -1.1842 0.25 -0.9302 -18.8344 -28.6457 -2.8693 -2.8707
1.3843 0.09 530 1.8088 -2.1501 -1.2043 0.25 -0.9458 -19.0354 -29.0030 -2.8696 -2.8710
1.296 0.09 540 1.7833 -2.1309 -1.2130 0.25 -0.9178 -19.1228 -28.8106 -2.8691 -2.8705
0.7343 0.09 550 1.8244 -2.1833 -1.2404 0.25 -0.9428 -19.3968 -29.3344 -2.8676 -2.8689
1.089 0.09 560 1.8288 -2.1789 -1.2313 0.25 -0.9476 -19.3059 -29.2912 -2.8690 -2.8704
0.8322 0.1 570 1.9009 -2.2715 -1.2811 0.25 -0.9903 -19.8038 -30.2165 -2.8697 -2.8711
0.8684 0.1 580 1.9310 -2.3144 -1.3151 0.25 -0.9993 -20.1433 -30.6454 -2.8722 -2.8736
0.9827 0.1 590 1.9558 -2.3309 -1.3222 0.25 -1.0087 -20.2145 -30.8112 -2.8740 -2.8754
0.5176 0.1 600 1.9731 -2.3665 -1.3574 0.25 -1.0091 -20.5666 -31.1672 -2.8754 -2.8768
1.0789 0.1 610 2.0276 -2.4550 -1.4152 0.25 -1.0398 -21.1444 -32.0516 -2.8756 -2.8769
0.8444 0.1 620 2.1331 -2.6253 -1.5121 0.25 -1.1132 -22.1132 -33.7550 -2.8726 -2.8739
1.6609 0.1 630 2.1160 -2.6511 -1.5573 0.25 -1.0938 -22.5657 -34.0127 -2.8740 -2.8753
1.3086 0.11 640 2.0791 -2.6721 -1.6152 0.25 -1.0569 -23.1446 -34.2231 -2.8749 -2.8762
1.0659 0.11 650 2.0520 -2.6575 -1.6184 0.25 -1.0391 -23.1760 -34.0763 -2.8766 -2.8778
1.3081 0.11 660 2.0481 -2.6650 -1.6332 0.25 -1.0318 -23.3247 -34.1520 -2.8756 -2.8769
0.769 0.11 670 2.0971 -2.7165 -1.6666 0.25 -1.0500 -23.6581 -34.6672 -2.8745 -2.8758
1.1385 0.11 680 2.1554 -2.7771 -1.7021 0.25 -1.0750 -24.0137 -35.2731 -2.8735 -2.8748
1.0306 0.12 690 2.2076 -2.8501 -1.7587 0.25 -1.0914 -24.5793 -36.0025 -2.8714 -2.8727
1.3893 0.12 700 2.2299 -2.8955 -1.7944 0.25 -1.1010 -24.9367 -36.4564 -2.8682 -2.8695
2.2234 0.12 710 2.2237 -2.9162 -1.8126 0.25 -1.1036 -25.1184 -36.6639 -2.8654 -2.8667
0.4678 0.12 720 2.2379 -2.9096 -1.7873 0.25 -1.1223 -24.8652 -36.5974 -2.8658 -2.8671
0.8098 0.12 730 2.2768 -2.9290 -1.7762 0.25 -1.1529 -24.7543 -36.7922 -2.8652 -2.8665
1.8821 0.12 740 2.2740 -2.9198 -1.7623 0.25 -1.1574 -24.6159 -36.6994 -2.8641 -2.8654
1.095 0.12 750 2.2689 -2.8862 -1.7174 0.25 -1.1688 -24.1662 -36.3637 -2.8647 -2.8660
1.7464 0.13 760 2.2488 -2.8828 -1.7320 0.25 -1.1508 -24.3128 -36.3297 -2.8640 -2.8653
0.9967 0.13 770 2.2235 -2.8783 -1.7502 0.25 -1.1281 -24.4945 -36.2849 -2.8622 -2.8634
0.7823 0.13 780 2.2370 -2.9074 -1.7744 0.25 -1.1330 -24.7361 -36.5759 -2.8593 -2.8606
1.3903 0.13 790 2.2755 -2.9143 -1.7485 0.25 -1.1658 -24.4774 -36.6450 -2.8587 -2.8600
2.0372 0.13 800 2.2250 -2.7892 -1.6505 0.25 -1.1387 -23.4972 -35.3939 -2.8629 -2.8642
0.7111 0.14 810 2.2409 -2.7911 -1.6348 0.25 -1.1562 -23.3407 -35.4124 -2.8642 -2.8654
0.8446 0.14 820 2.2740 -2.8395 -1.6646 0.25 -1.1749 -23.6383 -35.8968 -2.8638 -2.8651
1.2303 0.14 830 2.2812 -2.8540 -1.6787 0.25 -1.1752 -23.7798 -36.0417 -2.8648 -2.8661
0.5053 0.14 840 2.2834 -2.8740 -1.7065 0.25 -1.1675 -24.0571 -36.2418 -2.8640 -2.8653
0.5767 0.14 850 2.3105 -2.9262 -1.7448 0.25 -1.1814 -24.4399 -36.7635 -2.8618 -2.8631
1.7435 0.14 860 2.3174 -2.9360 -1.7519 0.25 -1.1841 -24.5119 -36.8619 -2.8627 -2.8639
1.6134 0.14 870 2.3028 -2.9288 -1.7659 0.25 -1.1629 -24.6517 -36.7902 -2.8635 -2.8647
1.747 0.15 880 2.2686 -2.8780 -1.7398 0.25 -1.1382 -24.3902 -36.2816 -2.8658 -2.8671
1.3341 0.15 890 2.2555 -2.8559 -1.7244 0.25 -1.1315 -24.2361 -36.0610 -2.8673 -2.8686
1.884 0.15 900 2.2349 -2.8291 -1.7129 0.25 -1.1162 -24.1211 -35.7924 -2.8677 -2.8689
0.5031 0.15 910 2.2361 -2.8327 -1.7156 0.25 -1.1171 -24.1479 -35.8284 -2.8671 -2.8684
0.7273 0.15 920 2.2545 -2.8595 -1.7291 0.25 -1.1304 -24.2834 -36.0963 -2.8665 -2.8678
1.2208 0.15 930 2.2655 -2.8756 -1.7364 0.25 -1.1393 -24.3561 -36.2580 -2.8656 -2.8669
0.6928 0.16 940 2.2697 -2.8817 -1.7405 0.25 -1.1412 -24.3971 -36.3184 -2.8652 -2.8665
2.2099 0.16 950 2.2581 -2.8642 -1.7302 0.25 -1.1340 -24.2945 -36.1442 -2.8656 -2.8668
1.6883 0.16 960 2.2544 -2.8575 -1.7258 0.25 -1.1318 -24.2503 -36.0772 -2.8656 -2.8668
1.9968 0.16 970 2.2455 -2.8405 -1.7135 0.25 -1.1271 -24.1270 -35.9072 -2.8657 -2.8670
2.1044 0.16 980 2.2400 -2.8308 -1.7064 0.25 -1.1243 -24.0569 -35.8097 -2.8656 -2.8668
0.7207 0.17 990 2.2376 -2.8286 -1.7067 0.25 -1.1218 -24.0597 -35.7875 -2.8654 -2.8667
1.1388 0.17 1000 2.2375 -2.8294 -1.7077 0.25 -1.1217 -24.0692 -35.7956 -2.8653 -2.8666

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
0
Unable to determine this model's library. Check the docs .

Finetuned from