argilla
/

distilabeled-OpenHermes-2.5-Mistral-7B

@@ -12,7 +12,7 @@ tags:
 ---
 # ⚗️ distilabeled OpenHermes 2.5 Mistral 7B
-> A Half Neural DPO of OpenHermes 2.5, less is more for DPO!
 <div>
     <img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/yWdvBtKKfJdpdnPiSlNb9.png">
@@ -110,7 +110,7 @@ dataset = dataset.filter(
         not r["in_gsm8k_train"]
 )
 ```
-This resulted in `5,922` instead of `12,859` samples (54% reduction) and led to the following benchmark results.
 ## Benchmark results
 For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and `score>5`).
@@ -118,21 +118,21 @@ For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find
 For running the benchmark we used another awesome contribution from Maxime: [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), check it out!
-|                                                      Model                                                        | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
-|-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|----------:|-----------------:|
-| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B)   |   **44.64** |   **73.35** |      55.96 |    42.21 |   **54.04** |     **5,922** |              **46%** |
-| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) |   44.27 |    73.3 |      **56.26** |    **42.25** |   54.02 |     7,732 |              60% |
-| mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe)                                                                   |   43.67 |   73.24 |      55.37 |    41.76 |   53.51 |    12,859 |             100% |
-| teknium/OpenHermes-2.5-Mistral-7B                                                                                 |   42.75  |   72.99 |    52.99  |   40.94  |    52.42| 0 (no DPO) | N/A |
 > Update: we now include llm-harness results too!
-| Model | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | dpo-pairs | % original pairs |
-|------------------------------------------------------|-------|-----------|------|-----------:|------------|-------|----------:|-----------------:|
-| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | 66.04 | **85.07** | Pending | 55.96 | **79.56** | **66.34** | **5,922** | **46%** |
-| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) | 65.36 | 84.74 | Pending | **56.26** | 79.24 | 65.13 | 7,732 | 60% |
-| [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | **66.55** | 84.90 | **63.32** | 54.93 | 78.30 | 61.30 | 12,859 | 100% |
-| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 64.93 | 84.18 | 63.64 | 52.24 | 78.06 | 26.08 | 0 (no DPO) | N/A |
 ### Training Hardware

 ---
 # ⚗️ distilabeled OpenHermes 2.5 Mistral 7B
+> A Neural DPO of OpenHermes 2.5, high quality matters for DPO!
 <div>
     <img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/yWdvBtKKfJdpdnPiSlNb9.png">
         not r["in_gsm8k_train"]
 )
 ```
+This resulted in `5,922` instead of `12,859` samples (54% reduction) and we run it for 200 steps (using around ~3.2K samples).
 ## Benchmark results
 For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and `score>5`).
 For running the benchmark we used another awesome contribution from Maxime: [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), check it out!
+|                                                      Model                                                        | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
+|-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|
+| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B)   |   **44.64** |   **73.35** |      55.96 |    42.21 |   **54.04** |
+| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) |   44.27 |    73.3 |      **56.26** |    **42.25** |   54.02 |
+| mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe)                                                                   |   43.67 |   73.24 |      55.37 |    41.76 |   53.51 |
+| teknium/OpenHermes-2.5-Mistral-7B                                                                                 |   42.75  |   72.99 |    52.99  |   40.94  |    52.42|
 > Update: we now include llm-harness results too!
+| Model | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
+|------------------------------------------------------|-------|-----------|------|-----------:|------------|-------|
+| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | 66.04 | **85.07** | Pending | 55.96 | **79.56** | **66.34** |
+| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) | 65.36 | 84.74 | Pending | **56.26** | 79.24 | 65.13 |
+| [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | **66.55** | 84.90 | **63.32** | 54.93 | 78.30 | 61.30 |
+| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 64.93 | 84.18 | 63.64 | 52.24 | 78.06 | 26.08 |
 ### Training Hardware