notus-7b-v1 / README.md
alvarobartt's picture
alvarobartt HF staff
Update README.md
11919de
|
raw
history blame
No virus
10.5 kB
metadata
model-index:
  - name: notus-7b-v1
    results: []
datasets:
  - argilla/ultrafeedback-binarized-avg-rating-for-dpo
language:
  - en
base_model: alignment-handbook/zephyr-7b-sft-full
library_name: transformers
pipeline_tag: text-generation
tags:
  - dpo
  - preference
  - ultrafeedback
license: apache-2.0

Model Card for Notus 7B v1

Image was artificially generated by Dalle-3 via ChatGPT Pro

Notus is going to be a collection of fine-tuned models using DPO, similarly to Zephyr, but mainly focused on the Direct Preference Optimization (DPO) step, aiming to incorporate preference feedback into the LLMs when fine-tuning those. Notus models are intended to be used as assistants via chat-like applications, and are evaluated with the MT-Bench, AlpacaEval, and LM Evaluation Harness benchmarks, to be directly compared with Zephyr fine-tuned models also using DPO.

Model Details

Model Description

  • Developed by: Argilla, Inc. (based on HuggingFace H4 and MistralAI previous efforts and amazing work)
  • Shared by: Argilla, Inc.
  • Model type: GPT-like 7B model DPO fine-tuned
  • Language(s) (NLP): Mainly English
  • License: Apache 2.0 (same as Zephyr 7B SFT and Mistral 7B v0.1)
  • Finetuned from model: alignment-handbook/zephyr-7b-sft-full

Model Sources

Model Date

Notus 7B v1 was trained along November, 2023. And the data as generated by GPT-4 without the usage of external resources, has a cutoff at September, 2021.

Evaluation

LM Eval Harness

We ran the evaluation using EleutherAI/lm-eval-harness from the big-refactor branch, aiming to mimic the Open LLM Leaderboard by HuggingFace H4, but running everything on our VMs instead, as we're still experimenting.

From a first evaluation on the benchmark, we could see that Notus 7B DPO slightly improved compared to Zephyr 7B Beta/Alpha and Mistral 7B as we see from the average metric of 7 tasks from the leaderboard.

Model Average ⬆️ ARC (25-s) ⬆️ HellaSwag (10-s) ⬆️ MMLU (5-s) ⬆️ TruthfulQA (MC2) (0-s) ⬇️ Winogrande (5-s) ⬇️ GSM8K (5-s) ⬆️ DROP (3-s) ⬇️
mistralai/Mistral-7B-v0.1 50.32 59.58 83.31 64.16 42.15 78.37 18.12 6.14
HuggingFaceH4/zephyr-7b-alpha 52.4 61.01 84.04 61.39 57.9 78.61 14.03 9.82
HuggingFaceH4/zephyr-7b-beta 52.15 62.03 84.36 61.07 57.45 77.74 12.74 9.66
Ours 54.09 64.25 84.90 61.69 52.77 74.51 39.5 0.98

Anyway, we will also add our model to the Open LLM Leaderboard queue to be evaluated on Hugging Face's end to ensure that the produced results match the same ones, as we found some inconsistencies for DROP using the big-refactor branch on lm-eval-harness.

MT Bench (Coming soon!)

Alpaca Eval (Coming soon!)

Training Details

Training Hardware

We used a VM with 8 x A100 40GB hosted in Lambda Labs.

Training Data

We used a slightly curated version of openbmb/UltraFeedback, named argilla/ultrafeedback-binarized-avg-rating-for-dpo.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5051 0.1 100 0.5180 0.1475 -0.3954 0.7183 0.5429 -246.6286 -297.5412 -2.7438 -3.0431
0.4321 0.21 200 0.4375 0.1353 -0.9529 0.7540 1.0882 -252.2036 -297.6632 -2.7578 -3.0543
0.3848 0.31 300 0.4301 -0.4813 -1.8921 0.7302 1.4107 -261.5956 -303.8301 -2.7592 -3.0508
0.3777 0.42 400 0.4091 -0.8597 -2.5306 0.7698 1.6709 -267.9805 -307.6138 -2.7476 -3.0474
0.3559 0.52 500 0.4332 -1.0424 -2.6019 0.7619 1.5595 -268.6939 -309.4406 -2.2960 -2.6106
0.4178 0.62 600 0.3934 -0.6434 -2.4837 0.7659 1.8404 -267.5121 -305.4503 -2.5487 -2.8508
0.4206 0.73 700 0.4058 -1.4700 -3.5113 0.7857 2.0413 -277.7877 -313.7168 -2.5679 -2.8727
0.4323 0.83 800 0.3929 -0.9025 -2.6935 0.7897 1.7910 -269.6095 -308.0414 -2.6213 -2.9202
0.3706 0.93 900 0.3903 -1.1122 -3.0257 0.8056 1.9135 -272.9316 -310.1388 -2.5428 -2.8416
0.0496 1.04 1000 0.3991 -1.4248 -4.1245 0.8016 2.6997 -283.9196 -313.2651 -2.5093 -2.8150
0.0723 1.14 1100 0.3999 -1.8789 -4.5317 0.7897 2.6528 -287.9914 -317.8056 -2.5170 -2.8242
0.0481 1.25 1200 0.4191 -2.6211 -5.5294 0.7817 2.9083 -297.9687 -325.2281 -2.5139 -2.8109
0.0432 1.35 1300 0.4070 -2.0605 -5.0460 0.8056 2.9855 -293.1345 -319.6214 -2.5153 -2.8121
0.0402 1.45 1400 0.4001 -2.2445 -5.0942 0.7937 2.8497 -293.6164 -321.4614 -2.4383 -2.7388
0.0529 1.56 1500 0.4066 -2.3499 -5.2468 0.8016 2.8969 -295.1426 -322.5153 -2.3906 -2.6963
0.0651 1.66 1600 0.3962 -2.0597 -4.8915 0.8016 2.8318 -291.5901 -319.6136 -2.3390 -2.6469
0.0738 1.77 1700 0.3942 -1.8893 -4.6107 0.8135 2.7214 -288.7817 -317.9099 -2.3532 -2.6607
0.0597 1.87 1800 0.3990 -1.8774 -4.7221 0.8175 2.8448 -289.8961 -317.7905 -2.2728 -2.5908
0.0686 1.97 1900 0.3924 -1.8745 -4.6807 0.8056 2.8062 -289.4821 -317.7617 -2.2554 -2.5658
0.0116 2.08 2000 0.4260 -2.4687 -5.7190 0.7937 3.2503 -299.8647 -323.7037 -2.2297 -2.5347
0.0114 2.18 2100 0.4519 -2.8266 -6.3706 0.7976 3.5440 -306.3802 -327.2823 -2.2185 -2.5219
0.0073 2.28 2200 0.4563 -2.9422 -6.5564 0.8016 3.6142 -308.2384 -328.4384 -2.2103 -2.5126
0.0094 2.39 2300 0.4636 -3.3246 -7.0542 0.8016 3.7296 -313.2165 -332.2628 -2.2059 -2.5081
0.0056 2.49 2400 0.4745 -3.3599 -7.1652 0.7976 3.8053 -314.3266 -332.6161 -2.1945 -2.4943
0.0052 2.6 2500 0.4812 -3.4916 -7.3391 0.7976 3.8475 -316.0656 -333.9322 -2.1888 -2.4881
0.0065 2.7 2600 0.4678 -3.2226 -6.9887 0.7976 3.7661 -312.5613 -331.2425 -2.1644 -2.4560
0.0059 2.8 2700 0.4694 -3.4307 -7.2484 0.7976 3.8177 -315.1584 -333.3234 -2.1572 -2.4483
0.0054 2.91 2800 0.4707 -3.4959 -7.3283 0.8056 3.8324 -315.9576 -333.9758 -2.1575 -2.4491

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1

Evaluation during Training

  • Loss: 0.4730
  • Rewards/chosen: -3.5289
  • Rewards/rejected: -7.3700
  • Rewards/accuracies: 0.8016
  • Rewards/margins: 3.8412
  • Logps/rejected: -316.3751
  • Logps/chosen: -334.3053
  • Logits/rejected: -2.1644
  • Logits/chosen: -2.4556