notus-7b-v1 / README.md
dvilasuero's picture
dvilasuero HF staff
Update README.md
3455ea2
|
raw
history blame
12.7 kB
metadata
model-index:
  - name: notus-7b-v1
    results: []
datasets:
  - argilla/ultrafeedback-binarized-preferences
language:
  - en
base_model: alignment-handbook/zephyr-7b-sft-full
library_name: transformers
pipeline_tag: text-generation
tags:
  - dpo
  - preference
  - ultrafeedback
license: mit
Image was artificially generated by Dalle-3 via ChatGPT Pro

Model Card for Notus 7B v1

Notus is a collection of fine-tuned models using Direct Preference Optimization (DPO) and related RLHF techniques. This model is version 1, fine-tuned with DPO starting with zephyr-7b-beta's SFT model.

Following a data-first approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. In particular, we've found data issues in the original UltraFeedback dataset, leading to high-scores for bad responses. After curating several hundreds of data points, we decided to binarize the dataset using the preference ratings, instead of the original critique overall_score. Using preference ratings, instead of critiques scores, led to a new dataset where the chosen response is different in ~50% of the cases.

This model wouldn't have been possible without the amazing Alignment Handbook and it's based on fruitful discussions with the H4 team. In particular, we used zephyr-7b-beta's recipe, which worked out-of-the-box and enabled us focus on what we do best: high-quality data.

Notus models are intended to be used as assistants via chat-like applications, and are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison with the original Zephyr dDPO model and other 7B models.

Model Details

Model Description

  • Developed by: Argilla (based on HuggingFace H4 and MistralAI previous efforts and amazing work)
  • Shared by: Argilla
  • Model type: GPT-like 7B model DPO fine-tuned
  • Language(s) (NLP): Mainly English
  • License: MIT (same as Zephyr 7B-beta)
  • Finetuned from model: alignment-handbook/zephyr-7b-sft-full

Model Sources

Performance

Chat benchmarks

Table adapted from Zephyr-7b-β and Starling's original tables for MT-Bench and AlpacaEval benchmarks. Results are shown sorted by AlpacaEval win rates and ommit some >7B for brevity. Notus stays on par with Zephyr on MT-Bench, while surpassing Zephyr, Claude 2, and Cohere Command on AlpacaEval. Making Notus the most-competitive 7B commercial model on AlpacaEval.

Model Size Alignment MT-Bench (score) AlpacaEval (win rate %) License
GPT-4-turbo - ? 9.32 97.70 Proprietary
XwinLM 70b V0.1 70B dPPO - 95.57 LLaMA 2 License
GPT-4 - RLHF 8.99 95.03 Proprietary
Tulu 2+DPO 70B V0.1 70B dDPO 6.29 95.28 Proprietary
LLaMA2 Chat 70B 70B RLHF 6.86 92.66 LLaMA 2 License
Starling-7B 7B C-RLFT + APA 8.09 91.99 CC-BY-NC-4.0
Notus-7b-v1 7B dDPO 7.30 91.42 MIT
Claude 2 - RLHF 8.06 91.36 Proprietary
Zephyr-7b-β 7B dDPO 7.34 90.60 MIT
Cohere Command - RLHF - 90.62 Proprietary
GPT-3.5-turbo - RLHF 7.94 89.37 Proprietary

Academic benchmarks

Results from OpenLLM Leaderboard:

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K DROP
Zephyr 7B dDPO (HuggingFaceH4/zephyr-7b-beta) 52.15 62.03 84.36 61.07 57.45 77.74 12.74 9.66
argilla/notus-7b-v1 52.89 64.59 84.78 63.03 54.37 79.4 15.16 8.91

Training Details

Training Hardware

We used a VM with 8 x A100 40GB hosted in Lambda Labs.

Training Data

We used a a new curated version of openbmb/UltraFeedback, named argilla/ultrafeedback-binarized-avg-rating-for-dpo.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5051 0.1 100 0.5180 0.1475 -0.3954 0.7183 0.5429 -246.6286 -297.5412 -2.7438 -3.0431
0.4321 0.21 200 0.4375 0.1353 -0.9529 0.7540 1.0882 -252.2036 -297.6632 -2.7578 -3.0543
0.3848 0.31 300 0.4301 -0.4813 -1.8921 0.7302 1.4107 -261.5956 -303.8301 -2.7592 -3.0508
0.3777 0.42 400 0.4091 -0.8597 -2.5306 0.7698 1.6709 -267.9805 -307.6138 -2.7476 -3.0474
0.3559 0.52 500 0.4332 -1.0424 -2.6019 0.7619 1.5595 -268.6939 -309.4406 -2.2960 -2.6106
0.4178 0.62 600 0.3934 -0.6434 -2.4837 0.7659 1.8404 -267.5121 -305.4503 -2.5487 -2.8508
0.4206 0.73 700 0.4058 -1.4700 -3.5113 0.7857 2.0413 -277.7877 -313.7168 -2.5679 -2.8727
0.4323 0.83 800 0.3929 -0.9025 -2.6935 0.7897 1.7910 -269.6095 -308.0414 -2.6213 -2.9202
0.3706 0.93 900 0.3903 -1.1122 -3.0257 0.8056 1.9135 -272.9316 -310.1388 -2.5428 -2.8416
0.0496 1.04 1000 0.3991 -1.4248 -4.1245 0.8016 2.6997 -283.9196 -313.2651 -2.5093 -2.8150
0.0723 1.14 1100 0.3999 -1.8789 -4.5317 0.7897 2.6528 -287.9914 -317.8056 -2.5170 -2.8242
0.0481 1.25 1200 0.4191 -2.6211 -5.5294 0.7817 2.9083 -297.9687 -325.2281 -2.5139 -2.8109
0.0432 1.35 1300 0.4070 -2.0605 -5.0460 0.8056 2.9855 -293.1345 -319.6214 -2.5153 -2.8121
0.0402 1.45 1400 0.4001 -2.2445 -5.0942 0.7937 2.8497 -293.6164 -321.4614 -2.4383 -2.7388
0.0529 1.56 1500 0.4066 -2.3499 -5.2468 0.8016 2.8969 -295.1426 -322.5153 -2.3906 -2.6963
0.0651 1.66 1600 0.3962 -2.0597 -4.8915 0.8016 2.8318 -291.5901 -319.6136 -2.3390 -2.6469
0.0738 1.77 1700 0.3942 -1.8893 -4.6107 0.8135 2.7214 -288.7817 -317.9099 -2.3532 -2.6607
0.0597 1.87 1800 0.3990 -1.8774 -4.7221 0.8175 2.8448 -289.8961 -317.7905 -2.2728 -2.5908
0.0686 1.97 1900 0.3924 -1.8745 -4.6807 0.8056 2.8062 -289.4821 -317.7617 -2.2554 -2.5658
0.0116 2.08 2000 0.4260 -2.4687 -5.7190 0.7937 3.2503 -299.8647 -323.7037 -2.2297 -2.5347
0.0114 2.18 2100 0.4519 -2.8266 -6.3706 0.7976 3.5440 -306.3802 -327.2823 -2.2185 -2.5219
0.0073 2.28 2200 0.4563 -2.9422 -6.5564 0.8016 3.6142 -308.2384 -328.4384 -2.2103 -2.5126
0.0094 2.39 2300 0.4636 -3.3246 -7.0542 0.8016 3.7296 -313.2165 -332.2628 -2.2059 -2.5081
0.0056 2.49 2400 0.4745 -3.3599 -7.1652 0.7976 3.8053 -314.3266 -332.6161 -2.1945 -2.4943
0.0052 2.6 2500 0.4812 -3.4916 -7.3391 0.7976 3.8475 -316.0656 -333.9322 -2.1888 -2.4881
0.0065 2.7 2600 0.4678 -3.2226 -6.9887 0.7976 3.7661 -312.5613 -331.2425 -2.1644 -2.4560
0.0059 2.8 2700 0.4694 -3.4307 -7.2484 0.7976 3.8177 -315.1584 -333.3234 -2.1572 -2.4483
0.0054 2.91 2800 0.4707 -3.4959 -7.3283 0.8056 3.8324 -315.9576 -333.9758 -2.1575 -2.4491

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1

Evaluation during Training

  • Loss: 0.4730
  • Rewards/chosen: -3.5289
  • Rewards/rejected: -7.3700
  • Rewards/accuracies: 0.8016
  • Rewards/margins: 3.8412
  • Logps/rejected: -316.3751
  • Logps/chosen: -334.3053
  • Logits/rejected: -2.1644
  • Logits/chosen: -2.4556