argilla
/

distilabeled-OpenHermes-2.5-Mistral-7B

@@ -19,10 +19,76 @@ tags:
 </div>
 |                                                      Model                                                        | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
 |-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|----------:|-----------------:|
 | [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B)   |   **44.64** |   **73.35** |      55.96 |    42.21 |   **54.04** |     5,922 |              **46%** |
 | [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) |   44.27 |    73.3 |      **56.26** |    **42.25** |   54.02 |     7,732 |              60% |
 | mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe)                                                                   |   43.67 |   73.24 |      55.37 |    41.76 |   53.51 |    12,859 |             100% |
-| teknium/OpenHermes-2.5-Mistral-7B                                                                                 |   42.75  |   72.99 |    52.99  |   40.94  |    52.42| 0 (no DPO) | N/A |

 </div>
+## Introduction
+This model is the launching partner of our new open dataset [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). It outperforms the awesome `mlabonne/NeuralHermes-2.5-Mistral-7B` with exactly the **same DPO recipe but 54% less data**.
+The dataset is a "distilabeled" version of the widely used dataset: [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). The original dataset has been used by 100s of open source practitioners and models. We knew from fixing UltraFeedback (and before that, Alpacas and Dollys) that this dataset could be highly improved.
+Continuing with our mission to build the best alignment datasets for open source LLMs and the community, we spent a few hours to improve it with [distilabel](https://github.com/argilla-io/distilabel).
+The main intuition was: the original dataset just assumes gpt4/3.5-turbo are always the best response. We know from UltraFeedback that's not always the case. Moreover, DPO fine-tuning benefits from diversity of preference pairs.
+This is what it took to build a real preference dataset with distilabel:
+```python
+from distilabel.llm import OpenAILLM
+from distilabel.tasks import UltraFeedbackTask, JudgeLMTask
+from distilabel.pipeline import Pipeline
+from datasets import load_dataset
+dataset = load_dataset("Intel/orca_dpo_pairs", split="train")
+# this shuffles the pairs to mitigate positional bias
+dataset = dataset.map(lambda x: shuffle_and_track(x["chosen"], x["rejected"]))
+# we use our JudgeLM implementation to rate the original pairs
+labeler = OpenAILLM(
+    task=JudgeLMTask(),
+    model="gpt-4-1106-preview",
+    num_threads=16,
+    max_new_tokens=512,
+)
+dataset = dataset.rename_columns({"question": "input"})
+distipipe = Pipeline(
+    labeller=labeler
+)
+# this computes ratings and natural language critiques for each pair
+ds = distipipe.generate(dataset=dataset, num_generations=2)
+```
+The resulting dataset is now much more useful: we know which response is preferred (by gpt-4-turbo), which ones have low scores, and we even have natural language explanations. But what did we find? Was our intuition confirmed?
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/-V8wY1DYzrtwM9LbGrBXq.png)
+The above chart shows the following:
+* ~4,000 pairs were given the same rating (a tie).
+* ~7,000 pairs were correct according to our AI judge (`unchanged`).
+* and ~2,000 times the rejected response was preferred (`swapped`).
+Now the next question is: can we build better models with this new knowledge? The answer is "distilabeled OpenHermes" so let's get back to the model!
+## Training details
+```python
+from datasets import load_dataset
+dataset = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")
+dataset = dataset.filter(
+    lambda r:
+        r["status"] != "tie" and
+        r["chosen_score"] >= 8 and
+        not r["in_gsm8k_train"]
+)
+```
+## Benchmark results
 |                                                      Model                                                        | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
 |-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|----------:|-----------------:|
 | [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B)   |   **44.64** |   **73.35** |      55.96 |    42.21 |   **54.04** |     5,922 |              **46%** |
 | [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) |   44.27 |    73.3 |      **56.26** |    **42.25** |   54.02 |     7,732 |              60% |
 | mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe)                                                                   |   43.67 |   73.24 |      55.37 |    41.76 |   53.51 |    12,859 |             100% |
+| teknium/OpenHermes-2.5-Mistral-7B                                                                                 |   42.75  |   72.99 |    52.99  |   40.94  |    52.42| 0 (no DPO) | N/A |