dvilasuero HF staff commited on
Commit
87ea670
1 Parent(s): 882129c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -176,11 +176,11 @@ Notus is a collection of fine-tuned models using Direct Preference Optimization
176
 
177
  Following a **data-first** approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO.
178
 
179
- In particular, when we started building [distilabel](https://github.com/argilla-io/distilabel), we took some time to deep-dive into the UltraFeedback dataset. Using [Argilla](https://argilla.io/), we've found data issues in the original UltraFeedback dataset, leading to high-scores for bad responses (more details in the training data section). After curating several hundreds of data points, we decided to binarize the dataset using the preference ratings, instead of the original critique `overall_score`.
180
 
181
  Using preference ratings, instead of critiques scores, led to a new dataset where the chosen response is different in ~50% of the cases. Using this new dataset with DPO we fine-tuned Notus, a 7B model, that **surpasses Zephyr-7B-beta, Claude 2, and Cohere Command on AlpacaEval**.
182
 
183
- This model **wouldn't have been possible without the amazing [Alignment Handbook](https://github.com/huggingface/alignment-handbook)** and it's based on fruitful discussions with the HuggingFace H4 team. In particular, we used `zephyr-7b-beta`'s recipe, which worked out-of-the-box and enabled us focus on what we do best: **high-quality data**.
184
 
185
  Notus models are intended to be used as assistants via chat-like applications, and are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison with the original Zephyr dDPO model and other 7B models.
186
 
@@ -326,7 +326,8 @@ We used a VM with 8 x A100 40GB hosted in Lambda Labs, but while experimenting w
326
 
327
  ### Training Data
328
 
329
- We used a a new curated version of [`openbmb/UltraFeedback`](https://huggingface.co/datasets/openbmb/UltraFeedback), named [`argilla/ultrafeedback-binarized-preferences`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences).
 
330
 
331
  TL;DR
332
 
@@ -342,6 +343,8 @@ While we're working on fixing the original dataset (already narrowed down ~2K pr
342
 
343
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/M9qCKyAB_G1MbVBAPeitd.png)
344
 
 
 
345
  ## Prompt template
346
 
347
  We use the same prompt template as [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta):
 
176
 
177
  Following a **data-first** approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO.
178
 
179
+ In particular, when we started building [distilabel](https://github.com/argilla-io/distilabel), we invested time understanding and deep-diving into the UltraFeedback dataset. Using [Argilla](https://argilla.io/), we've found data issues in the original UltraFeedback dataset, leading to high-scores for bad responses (more details in the training data section). After curating several hundreds of data points, we decided to binarize the dataset using the preference ratings, instead of the original critique `overall_score`, and verified the new dataset with Argilla.
180
 
181
  Using preference ratings, instead of critiques scores, led to a new dataset where the chosen response is different in ~50% of the cases. Using this new dataset with DPO we fine-tuned Notus, a 7B model, that **surpasses Zephyr-7B-beta, Claude 2, and Cohere Command on AlpacaEval**.
182
 
183
+ This model **wouldn't have been possible without the amazing [Alignment Handbook](https://github.com/huggingface/alignment-handbook), [OpenBMB](https://www.openbmb.cn/home) for releasing the Ultrafeedback dataset**, and it's based on fruitful discussions with the HuggingFace H4 team. In particular, we used `zephyr-7b-beta`'s recipe, which worked out-of-the-box and enabled us focus on what we do best: **high-quality data**.
184
 
185
  Notus models are intended to be used as assistants via chat-like applications, and are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison with the original Zephyr dDPO model and other 7B models.
186
 
 
326
 
327
  ### Training Data
328
 
329
+ We used a a new curated version [`argilla/ultrafeedback-binarized-preferences`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences).
330
+ of [`openbmb/UltraFeedback`](https://huggingface.co/datasets/openbmb/UltraFeedback), named [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences).
331
 
332
  TL;DR
333
 
 
343
 
344
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/M9qCKyAB_G1MbVBAPeitd.png)
345
 
346
+ You can find more details about the dataset analysis and curation on the [ultrafeedback-binarized-preferences dataset card](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences).
347
+
348
  ## Prompt template
349
 
350
  We use the same prompt template as [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta):