Commit
•
3455ea2
1
Parent(s):
20b81a6
Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ model-index:
|
|
3 |
- name: notus-7b-v1
|
4 |
results: []
|
5 |
datasets:
|
6 |
-
- argilla/ultrafeedback-binarized-
|
7 |
language:
|
8 |
- en
|
9 |
base_model: alignment-handbook/zephyr-7b-sft-full
|
@@ -20,15 +20,12 @@ license: mit
|
|
20 |
</div>
|
21 |
|
22 |
# Model Card for Notus 7B v1
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
Notus is a collection of fine-tuned models using Direct Preference Optimization (DPO) and related RLHF techniques. This model is version 1, fine-tuned with DPO starting with zephyr-7b-beta's SFT model.
|
27 |
|
28 |
Following a **data-first** approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. In particular, we've found data issues in the original UltraFeedback dataset, leading to high-scores for bad responses. After curating several hundreds of data points, we decided to binarize the dataset using the preference ratings, instead of the original critique `overall_score`.
|
29 |
Using preference ratings, instead of critiques scores, led to a new dataset where the chosen response is different in ~50% of the cases.
|
30 |
|
31 |
-
This model wouldn't have been possible without the amazing [Alignment Handbook]( https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-beta) and it's based on fruitful discussions with the H4 team. In particular, we used zephyr-7b-beta's recipe, which worked out-of-the-box and
|
32 |
|
33 |
Notus models are intended to be used as assistants via chat-like applications, and
|
34 |
are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison
|
@@ -54,25 +51,108 @@ with the original Zephyr dDPO model and other 7B models.
|
|
54 |
## Performance
|
55 |
|
56 |
### Chat benchmarks
|
57 |
-
Table adapted from Zephyr-7b-β original
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
|
77 |
## Academic benchmarks
|
78 |
|
|
|
3 |
- name: notus-7b-v1
|
4 |
results: []
|
5 |
datasets:
|
6 |
+
- argilla/ultrafeedback-binarized-preferences
|
7 |
language:
|
8 |
- en
|
9 |
base_model: alignment-handbook/zephyr-7b-sft-full
|
|
|
20 |
</div>
|
21 |
|
22 |
# Model Card for Notus 7B v1
|
|
|
|
|
|
|
23 |
Notus is a collection of fine-tuned models using Direct Preference Optimization (DPO) and related RLHF techniques. This model is version 1, fine-tuned with DPO starting with zephyr-7b-beta's SFT model.
|
24 |
|
25 |
Following a **data-first** approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO. In particular, we've found data issues in the original UltraFeedback dataset, leading to high-scores for bad responses. After curating several hundreds of data points, we decided to binarize the dataset using the preference ratings, instead of the original critique `overall_score`.
|
26 |
Using preference ratings, instead of critiques scores, led to a new dataset where the chosen response is different in ~50% of the cases.
|
27 |
|
28 |
+
This model wouldn't have been possible without the amazing [Alignment Handbook]( https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-beta) and it's based on fruitful discussions with the H4 team. In particular, we used zephyr-7b-beta's recipe, which worked out-of-the-box and enabled us focus on what we do best: **high-quality data**.
|
29 |
|
30 |
Notus models are intended to be used as assistants via chat-like applications, and
|
31 |
are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison
|
|
|
51 |
## Performance
|
52 |
|
53 |
### Chat benchmarks
|
54 |
+
Table adapted from Zephyr-7b-β and Starling's original tables for [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks. Results are shown sorted by AlpacaEval win rates and ommit some >7B for brevity.
|
55 |
+
Notus stays on par with Zephyr on MT-Bench, while surpassing Zephyr, Claude 2, and Cohere Command on AlpacaEval. Making Notus the most-competitive 7B commercial model on AlpacaEval.
|
56 |
+
<table>
|
57 |
+
<tr>
|
58 |
+
<th>Model</th>
|
59 |
+
<th>Size</th>
|
60 |
+
<th>Alignment</th>
|
61 |
+
<th>MT-Bench (score)</th>
|
62 |
+
<th>AlpacaEval (win rate %)</th>
|
63 |
+
<th>License</th>
|
64 |
+
</tr>
|
65 |
+
<tr>
|
66 |
+
<td>GPT-4-turbo</td>
|
67 |
+
<td>-</td>
|
68 |
+
<td>?</td>
|
69 |
+
<td>9.32</td>
|
70 |
+
<td>97.70</td>
|
71 |
+
<td>Proprietary</td>
|
72 |
+
</tr>
|
73 |
+
<tr>
|
74 |
+
<td>XwinLM 70b V0.1</td>
|
75 |
+
<td>70B</td>
|
76 |
+
<td>dPPO</td>
|
77 |
+
<td>-</td>
|
78 |
+
<td>95.57</td>
|
79 |
+
<td>LLaMA 2 License</td>
|
80 |
+
</tr>
|
81 |
+
<tr>
|
82 |
+
<td>GPT-4</td>
|
83 |
+
<td>-</td>
|
84 |
+
<td>RLHF</td>
|
85 |
+
<td>8.99</td>
|
86 |
+
<td>95.03</td>
|
87 |
+
<td>Proprietary</td>
|
88 |
+
</tr>
|
89 |
+
<tr>
|
90 |
+
<td>Tulu 2+DPO 70B V0.1</td>
|
91 |
+
<td>70B</td>
|
92 |
+
<td>dDPO</td>
|
93 |
+
<td>6.29</td>
|
94 |
+
<td>95.28</td>
|
95 |
+
<td>Proprietary</td>
|
96 |
+
</tr>
|
97 |
+
<tr>
|
98 |
+
<td>LLaMA2 Chat 70B</td>
|
99 |
+
<td>70B</td>
|
100 |
+
<td>RLHF</td>
|
101 |
+
<td>6.86</td>
|
102 |
+
<td>92.66</td>
|
103 |
+
<td>LLaMA 2 License</td>
|
104 |
+
</tr>
|
105 |
+
<tr>
|
106 |
+
<td>Starling-7B</td>
|
107 |
+
<td>7B</td>
|
108 |
+
<td>C-RLFT + APA</td>
|
109 |
+
<td><strong>8.09</strong></td>
|
110 |
+
<td><strong>91.99</strong></td>
|
111 |
+
<td>CC-BY-NC-4.0</td>
|
112 |
+
</tr>
|
113 |
+
<tr style="background-color: #FFFF99;">
|
114 |
+
<td><strong>Notus-7b-v1</strong></td>
|
115 |
+
<td>7B</td>
|
116 |
+
<td>dDPO</td>
|
117 |
+
<td>7.30</td>
|
118 |
+
<td>91.42</td>
|
119 |
+
<td>MIT</td>
|
120 |
+
</tr>
|
121 |
+
<tr>
|
122 |
+
<td>Claude 2</td>
|
123 |
+
<td>-</td>
|
124 |
+
<td>RLHF</td>
|
125 |
+
<td>8.06</td>
|
126 |
+
<td>91.36</td>
|
127 |
+
<td>Proprietary</td>
|
128 |
+
</tr>
|
129 |
+
<tr>
|
130 |
+
<td>Zephyr-7b-β</td>
|
131 |
+
<td>7B</td>
|
132 |
+
<td>dDPO</td>
|
133 |
+
<td>7.34</td>
|
134 |
+
<td>90.60</td>
|
135 |
+
<td>MIT</td>
|
136 |
+
</tr>
|
137 |
+
<tr>
|
138 |
+
<td>Cohere Command</td>
|
139 |
+
<td>-</td>
|
140 |
+
<td>RLHF</td>
|
141 |
+
<td>-</td>
|
142 |
+
<td>90.62</td>
|
143 |
+
<td>Proprietary</td>
|
144 |
+
</tr>
|
145 |
+
<tr>
|
146 |
+
<td>GPT-3.5-turbo</td>
|
147 |
+
<td>-</td>
|
148 |
+
<td>RLHF</td>
|
149 |
+
<td>7.94</td>
|
150 |
+
<td>89.37</td>
|
151 |
+
<td>Proprietary</td>
|
152 |
+
</tr>
|
153 |
+
</table>
|
154 |
+
|
155 |
+
|
156 |
|
157 |
## Academic benchmarks
|
158 |
|