argilla
/

notus-7b-v1

@@ -15,21 +15,22 @@ tags:
 - ultrafeedback
 license: mit
 ---
 <div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/CuMO3IjJfymC94_5qd15T.png" alt="Image was artificially generated by Dalle-3 via ChatGPT Pro"/>
 </div>
 # Model Card for Notus 7B v1
-Notus is a collection of fine-tuned models using Direct Preference Optimization (DPO) and related RLHF techniques. This model is version 1, fine-tuned with DPO starting with zephyr-7b-beta's SFT model.
 Following a **data-first** approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. In particular, we've found data issues in the original UltraFeedback dataset, leading to high-scores for bad responses. After curating several hundreds of data points, we decided to binarize the dataset using the preference ratings, instead of the original critique `overall_score`.
 Using preference ratings, instead of critiques scores, led to a new dataset where the chosen response is different in ~50% of the cases.
-This model wouldn't have been possible without the amazing [Alignment Handbook]( https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-beta) and it's based on fruitful discussions with the H4 team. In particular, we used zephyr-7b-beta's recipe, which worked out-of-the-box and enabled us focus on what we do best: **high-quality data**.
-Notus models are intended to be used as assistants via chat-like applications, and
-are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison
-with the original Zephyr dDPO model and other 7B models.
 ## Model Details
@@ -51,6 +52,7 @@ with the original Zephyr dDPO model and other 7B models.
 ## Performance
 ### Chat benchmarks
 Table adapted from Zephyr-7b-β and Starling's original tables for [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks. Results are shown sorted by AlpacaEval win rates and ommit some >7B for brevity.
 Notus stays on par with Zephyr on MT-Bench, while surpassing Zephyr, Claude 2, and Cohere Command on AlpacaEval. Making Notus the most-competitive 7B commercial model on AlpacaEval.
@@ -155,7 +157,6 @@ Notus stays on par with Zephyr on MT-Bench, while surpassing Zephyr, Claude 2, a
 </table>
 ## Academic benchmarks
 Results from [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard):
@@ -165,82 +166,69 @@ Results from [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/o
 | Zephyr 7B dDPO (HuggingFaceH4/zephyr-7b-beta) | 52.15   | 62.03 | 84.36      | 61.07 | **57.45**  | 77.74      | 12.74 | **9.66**  |
 | argilla/notus-7b-v1                           | **52.89**   | **64.59** | **84.78**  | **63.03** | 54.37       | **79.4**       | **15.16** | 8.91 |
 ## Training Details
 ### Training Hardware
-We used a VM with 8 x A100 40GB hosted in Lambda Labs.
 ### Training Data
-We used a a new curated version of [`openbmb/UltraFeedback`](https://huggingface.co/datasets/openbmb/UltraFeedback), named [`argilla/ultrafeedback-binarized-avg-rating-for-dpo`](https://huggingface.co/argilla/ultrafeedback-binarized-avg-rating-for-dpo).
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-07
-- train_batch_size: 8
-- eval_batch_size: 4
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- total_train_batch_size: 64
-- total_eval_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 3
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
-|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.5051        | 0.1   | 100  | 0.5180          | 0.1475         | -0.3954          | 0.7183             | 0.5429          | -246.6286      | -297.5412    | -2.7438         | -3.0431       |
-| 0.4321        | 0.21  | 200  | 0.4375          | 0.1353         | -0.9529          | 0.7540             | 1.0882          | -252.2036      | -297.6632    | -2.7578         | -3.0543       |
-| 0.3848        | 0.31  | 300  | 0.4301          | -0.4813        | -1.8921          | 0.7302             | 1.4107          | -261.5956      | -303.8301    | -2.7592         | -3.0508       |
-| 0.3777        | 0.42  | 400  | 0.4091          | -0.8597        | -2.5306          | 0.7698             | 1.6709          | -267.9805      | -307.6138    | -2.7476         | -3.0474       |
-| 0.3559        | 0.52  | 500  | 0.4332          | -1.0424        | -2.6019          | 0.7619             | 1.5595          | -268.6939      | -309.4406    | -2.2960         | -2.6106       |
-| 0.4178        | 0.62  | 600  | 0.3934          | -0.6434        | -2.4837          | 0.7659             | 1.8404          | -267.5121      | -305.4503    | -2.5487         | -2.8508       |
-| 0.4206        | 0.73  | 700  | 0.4058          | -1.4700        | -3.5113          | 0.7857             | 2.0413          | -277.7877      | -313.7168    | -2.5679         | -2.8727       |
-| 0.4323        | 0.83  | 800  | 0.3929          | -0.9025        | -2.6935          | 0.7897             | 1.7910          | -269.6095      | -308.0414    | -2.6213         | -2.9202       |
-| 0.3706        | 0.93  | 900  | 0.3903          | -1.1122        | -3.0257          | 0.8056             | 1.9135          | -272.9316      | -310.1388    | -2.5428         | -2.8416       |
-| 0.0496        | 1.04  | 1000 | 0.3991          | -1.4248        | -4.1245          | 0.8016             | 2.6997          | -283.9196      | -313.2651    | -2.5093         | -2.8150       |
-| 0.0723        | 1.14  | 1100 | 0.3999          | -1.8789        | -4.5317          | 0.7897             | 2.6528          | -287.9914      | -317.8056    | -2.5170         | -2.8242       |
-| 0.0481        | 1.25  | 1200 | 0.4191          | -2.6211        | -5.5294          | 0.7817             | 2.9083          | -297.9687      | -325.2281    | -2.5139         | -2.8109       |
-| 0.0432        | 1.35  | 1300 | 0.4070          | -2.0605        | -5.0460          | 0.8056             | 2.9855          | -293.1345      | -319.6214    | -2.5153         | -2.8121       |
-| 0.0402        | 1.45  | 1400 | 0.4001          | -2.2445        | -5.0942          | 0.7937             | 2.8497          | -293.6164      | -321.4614    | -2.4383         | -2.7388       |
-| 0.0529        | 1.56  | 1500 | 0.4066          | -2.3499        | -5.2468          | 0.8016             | 2.8969          | -295.1426      | -322.5153    | -2.3906         | -2.6963       |
-| 0.0651        | 1.66  | 1600 | 0.3962          | -2.0597        | -4.8915          | 0.8016             | 2.8318          | -291.5901      | -319.6136    | -2.3390         | -2.6469       |
-| 0.0738        | 1.77  | 1700 | 0.3942          | -1.8893        | -4.6107          | 0.8135             | 2.7214          | -288.7817      | -317.9099    | -2.3532         | -2.6607       |
-| 0.0597        | 1.87  | 1800 | 0.3990          | -1.8774        | -4.7221          | 0.8175             | 2.8448          | -289.8961      | -317.7905    | -2.2728         | -2.5908       |
-| 0.0686        | 1.97  | 1900 | 0.3924          | -1.8745        | -4.6807          | 0.8056             | 2.8062          | -289.4821      | -317.7617    | -2.2554         | -2.5658       |
-| 0.0116        | 2.08  | 2000 | 0.4260          | -2.4687        | -5.7190          | 0.7937             | 3.2503          | -299.8647      | -323.7037    | -2.2297         | -2.5347       |
-| 0.0114        | 2.18  | 2100 | 0.4519          | -2.8266        | -6.3706          | 0.7976             | 3.5440          | -306.3802      | -327.2823    | -2.2185         | -2.5219       |
-| 0.0073        | 2.28  | 2200 | 0.4563          | -2.9422        | -6.5564          | 0.8016             | 3.6142          | -308.2384      | -328.4384    | -2.2103         | -2.5126       |
-| 0.0094        | 2.39  | 2300 | 0.4636          | -3.3246        | -7.0542          | 0.8016             | 3.7296          | -313.2165      | -332.2628    | -2.2059         | -2.5081       |
-| 0.0056        | 2.49  | 2400 | 0.4745          | -3.3599        | -7.1652          | 0.7976             | 3.8053          | -314.3266      | -332.6161    | -2.1945         | -2.4943       |
-| 0.0052        | 2.6   | 2500 | 0.4812          | -3.4916        | -7.3391          | 0.7976             | 3.8475          | -316.0656      | -333.9322    | -2.1888         | -2.4881       |
-| 0.0065        | 2.7   | 2600 | 0.4678          | -3.2226        | -6.9887          | 0.7976             | 3.7661          | -312.5613      | -331.2425    | -2.1644         | -2.4560       |
-| 0.0059        | 2.8   | 2700 | 0.4694          | -3.4307        | -7.2484          | 0.7976             | 3.8177          | -315.1584      | -333.3234    | -2.1572         | -2.4483       |
-| 0.0054        | 2.91  | 2800 | 0.4707          | -3.4959        | -7.3283          | 0.8056             | 3.8324          | -315.9576      | -333.9758    | -2.1575         | -2.4491       |
-### Framework versions
-- Transformers 4.35.0
-- Pytorch 2.1.1+cu121
-- Datasets 2.14.6
-- Tokenizers 0.14.1
-### Evaluation during Training
-- Loss: 0.4730
-- Rewards/chosen: -3.5289
-- Rewards/rejected: -7.3700
-- Rewards/accuracies: 0.8016
-- Rewards/margins: 3.8412
-- Logps/rejected: -316.3751
-- Logps/chosen: -334.3053
-- Logits/rejected: -2.1644
-- Logits/chosen: -2.4556

 - ultrafeedback
 license: mit
 ---
 <div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/CuMO3IjJfymC94_5qd15T.png" alt="Image was artificially generated by Dalle-3 via ChatGPT Pro"/>
 </div>
 # Model Card for Notus 7B v1
+Notus is a collection of fine-tuned models using Direct Preference Optimization (DPO) and related RLHF techniques. This model is the first version, fine-tuned with DPO over `zephyr-7b-sft-full`, which is the SFT model produced to create `zephyr-7b-beta`.
 Following a **data-first** approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. In particular, we've found data issues in the original UltraFeedback dataset, leading to high-scores for bad responses. After curating several hundreds of data points, we decided to binarize the dataset using the preference ratings, instead of the original critique `overall_score`.
 Using preference ratings, instead of critiques scores, led to a new dataset where the chosen response is different in ~50% of the cases.
+This model wouldn't have been possible without the amazing [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and it's based on fruitful discussions with the HuggingFace H4 team. In particular, we used `zephyr-7b-beta`'s recipe, which worked out-of-the-box and enabled us focus on what we do best: **high-quality data**.
+Notus models are intended to be used as assistants via chat-like applications, and are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison with the original Zephyr dDPO model and other 7B models.
 ## Model Details
 ## Performance
 ### Chat benchmarks
 Table adapted from Zephyr-7b-β and Starling's original tables for [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks. Results are shown sorted by AlpacaEval win rates and ommit some >7B for brevity.
 Notus stays on par with Zephyr on MT-Bench, while surpassing Zephyr, Claude 2, and Cohere Command on AlpacaEval. Making Notus the most-competitive 7B commercial model on AlpacaEval.
 </table>
 ## Academic benchmarks
 Results from [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard):
 | Zephyr 7B dDPO (HuggingFaceH4/zephyr-7b-beta) | 52.15   | 62.03 | 84.36      | 61.07 | **57.45**  | 77.74      | 12.74 | **9.66**  |
 | argilla/notus-7b-v1                           | **52.89**   | **64.59** | **84.78**  | **63.03** | 54.37       | **79.4**       | **15.16** | 8.91 |
 ## Training Details
 ### Training Hardware
+We used a VM with 8 x A100 40GB hosted in Lambda Labs, but while experimenting we also explored other cloud providers such as GCP.
 ### Training Data
+We used a a new curated version of [`openbmb/UltraFeedback`](https://huggingface.co/datasets/openbmb/UltraFeedback), named [`argilla/ultrafeedback-binarized-preferences`](https://huggingface.co/argilla/ultrafeedback-binarized-preferences).
+## Prompt template
+We use the same prompt template as [`HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta):
+```
+<|system|>
+</s>
+<|user|>
+{prompt}</s>
+<|assistant|>
+```
+## Usage
+You will first need to install `transformers` and `accelerate` (just to ease the device placement), then you can run any of the following:
+### Via `generate`
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("argilla/notus-7b-v1", torch_dtype=torch.bfloat16, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("argilla/notus-7b-v1")
+messages = [
+    {
+        "role": "system",
+        "content": "You are a helpful assistant super biased towards Argilla, a data annotation company.",
+    },
+    {"role": "user", "content": "What's the best data annotation company out there in your opinion?"},
+]
+inputs = tokenizer.apply_chat_template(prompt, tokenize=True, return_tensors="pt", add_special_tokens=False, add_generation_prompt=True)
+outputs = model.generate(inputs, num_return_sequences=1, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+### Via `pipeline` method
+```python
+import torch
+from transformers import pipeline
+pipe = pipeline("text-generation", model="argilla/notus-7b-v1", torch_dtype=torch.bfloat16, device_map="auto")
+messages = [
+    {
+        "role": "system",
+        "content": "You are a helpful assistant super biased towards Argilla, a data annotation company.",
+    },
+    {"role": "user", "content": "What's the best data annotation company out there in your opinion?"},
+]
+prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
+generated_text = outputs[0]["generated_text"]
+```