notus-7b-v1 / README.md
dvilasuero's picture
dvilasuero HF staff
Model save
e24828f
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - generated_from_trainer
model-index:
  - name: notus-7b-dpo
    results: []

notus-7b-dpo

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4730
  • Rewards/chosen: -3.5289
  • Rewards/rejected: -7.3700
  • Rewards/accuracies: 0.8016
  • Rewards/margins: 3.8412
  • Logps/rejected: -316.3751
  • Logps/chosen: -334.3053
  • Logits/rejected: -2.1644
  • Logits/chosen: -2.4556

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5051 0.1 100 0.5180 0.1475 -0.3954 0.7183 0.5429 -246.6286 -297.5412 -2.7438 -3.0431
0.4321 0.21 200 0.4375 0.1353 -0.9529 0.7540 1.0882 -252.2036 -297.6632 -2.7578 -3.0543
0.3848 0.31 300 0.4301 -0.4813 -1.8921 0.7302 1.4107 -261.5956 -303.8301 -2.7592 -3.0508
0.3777 0.42 400 0.4091 -0.8597 -2.5306 0.7698 1.6709 -267.9805 -307.6138 -2.7476 -3.0474
0.3559 0.52 500 0.4332 -1.0424 -2.6019 0.7619 1.5595 -268.6939 -309.4406 -2.2960 -2.6106
0.4178 0.62 600 0.3934 -0.6434 -2.4837 0.7659 1.8404 -267.5121 -305.4503 -2.5487 -2.8508
0.4206 0.73 700 0.4058 -1.4700 -3.5113 0.7857 2.0413 -277.7877 -313.7168 -2.5679 -2.8727
0.4323 0.83 800 0.3929 -0.9025 -2.6935 0.7897 1.7910 -269.6095 -308.0414 -2.6213 -2.9202
0.3706 0.93 900 0.3903 -1.1122 -3.0257 0.8056 1.9135 -272.9316 -310.1388 -2.5428 -2.8416
0.0496 1.04 1000 0.3991 -1.4248 -4.1245 0.8016 2.6997 -283.9196 -313.2651 -2.5093 -2.8150
0.0723 1.14 1100 0.3999 -1.8789 -4.5317 0.7897 2.6528 -287.9914 -317.8056 -2.5170 -2.8242
0.0481 1.25 1200 0.4191 -2.6211 -5.5294 0.7817 2.9083 -297.9687 -325.2281 -2.5139 -2.8109
0.0432 1.35 1300 0.4070 -2.0605 -5.0460 0.8056 2.9855 -293.1345 -319.6214 -2.5153 -2.8121
0.0402 1.45 1400 0.4001 -2.2445 -5.0942 0.7937 2.8497 -293.6164 -321.4614 -2.4383 -2.7388
0.0529 1.56 1500 0.4066 -2.3499 -5.2468 0.8016 2.8969 -295.1426 -322.5153 -2.3906 -2.6963
0.0651 1.66 1600 0.3962 -2.0597 -4.8915 0.8016 2.8318 -291.5901 -319.6136 -2.3390 -2.6469
0.0738 1.77 1700 0.3942 -1.8893 -4.6107 0.8135 2.7214 -288.7817 -317.9099 -2.3532 -2.6607
0.0597 1.87 1800 0.3990 -1.8774 -4.7221 0.8175 2.8448 -289.8961 -317.7905 -2.2728 -2.5908
0.0686 1.97 1900 0.3924 -1.8745 -4.6807 0.8056 2.8062 -289.4821 -317.7617 -2.2554 -2.5658
0.0116 2.08 2000 0.4260 -2.4687 -5.7190 0.7937 3.2503 -299.8647 -323.7037 -2.2297 -2.5347
0.0114 2.18 2100 0.4519 -2.8266 -6.3706 0.7976 3.5440 -306.3802 -327.2823 -2.2185 -2.5219
0.0073 2.28 2200 0.4563 -2.9422 -6.5564 0.8016 3.6142 -308.2384 -328.4384 -2.2103 -2.5126
0.0094 2.39 2300 0.4636 -3.3246 -7.0542 0.8016 3.7296 -313.2165 -332.2628 -2.2059 -2.5081
0.0056 2.49 2400 0.4745 -3.3599 -7.1652 0.7976 3.8053 -314.3266 -332.6161 -2.1945 -2.4943
0.0052 2.6 2500 0.4812 -3.4916 -7.3391 0.7976 3.8475 -316.0656 -333.9322 -2.1888 -2.4881
0.0065 2.7 2600 0.4678 -3.2226 -6.9887 0.7976 3.7661 -312.5613 -331.2425 -2.1644 -2.4560
0.0059 2.8 2700 0.4694 -3.4307 -7.2484 0.7976 3.8177 -315.1584 -333.3234 -2.1572 -2.4483
0.0054 2.91 2800 0.4707 -3.4959 -7.3283 0.8056 3.8324 -315.9576 -333.9758 -2.1575 -2.4491

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1