dvilasuero HF staff commited on
Commit
467ec5c
1 Parent(s): 476767c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -20,7 +20,7 @@ tags:
20
 
21
 
22
  ## Introduction
23
- This model is the launching partner of our new open dataset [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). It outperforms the awesome `mlabonne/NeuralHermes-2.5-Mistral-7B` with exactly the **same DPO recipe but 54% less data**.
24
 
25
  The dataset is a "distilabeled" version of the widely used dataset: [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). The original dataset has been used by 100s of open source practitioners and models. We knew from fixing UltraFeedback (and before that, Alpacas and Dollys) that this dataset could be highly improved.
26
 
@@ -69,13 +69,33 @@ The above chart shows the following:
69
  * ~7,000 pairs were correct according to our AI judge (`unchanged`).
70
  * and ~2,000 times the rejected response was preferred (`swapped`).
71
 
72
- Now the next question is: can we build better models with this new knowledge? The answer is "distilabeled OpenHermes" so let's get back to the model!
 
 
73
 
74
  ## Training details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
  ```python
77
  from datasets import load_dataset
78
 
 
 
 
 
79
  dataset = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")
80
 
81
  dataset = dataset.filter(
@@ -85,10 +105,27 @@ dataset = dataset.filter(
85
  not r["in_gsm8k_train"]
86
  )
87
  ```
 
 
88
  ## Benchmark results
 
 
 
 
 
89
  | Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
90
  |-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|----------:|-----------------:|
91
  | [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | **44.64** | **73.35** | 55.96 | 42.21 | **54.04** | 5,922 | **46%** |
92
  | [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) | 44.27 | 73.3 | **56.26** | **42.25** | 54.02 | 7,732 | 60% |
93
  | mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 | 12,859 | 100% |
94
- | teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42| 0 (no DPO) | N/A |
 
 
 
 
 
 
 
 
 
 
 
20
 
21
 
22
  ## Introduction
23
+ This model is the launching partner of our new open dataset [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). It outperforms the awesome `mlabonne/NeuralHermes-2.5-Mistral-7B` with the **exact same DPO recipe but 54% less data**.
24
 
25
  The dataset is a "distilabeled" version of the widely used dataset: [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs). The original dataset has been used by 100s of open source practitioners and models. We knew from fixing UltraFeedback (and before that, Alpacas and Dollys) that this dataset could be highly improved.
26
 
 
69
  * ~7,000 pairs were correct according to our AI judge (`unchanged`).
70
  * and ~2,000 times the rejected response was preferred (`swapped`).
71
 
72
+ Now the next question is: can we build better models with this new knowledge? The answer is "distilabeled Hermes" so let's get back to the model!
73
+
74
+ > If you love datasets as much as we do, check the [dataset](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) and share it with your friends and colleagues.
75
 
76
  ## Training details
77
+ As we did with [Notus](https://argilla.io/blog/notus7b/), we wanted a reproducible recipe to test the impact of data quality.
78
+
79
+ And we're lucky to have so many amazing folks in the open community contributing reproducible, easy-to-use training scripts and recipes. This time, [Maxime Labonne](https://twitter.com/maximelabonne) had shared a [Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) to fine-tune OpenHermes with DPO and the original Intel's dataset, perfect! (funnily enough this exact recipe has been used recently to fine-tune the [top ranked 7B model](https://huggingface.co/CultriX/MistralTrix-v1)).
80
+
81
+ And that's all for the model part: we reused a good, reproducible recipe.
82
+
83
+ Once we had created the dataset, the training data part is also kind of boring: we just filtered the samples based on our intuition and with the goal of reducing the dataset size:
84
+
85
+ * Ties probably won't help the DPO tuning to learn something meaningful: both responses are similarly good or bad (filter out `ties`)
86
+ * Very good chosen responses will steer the model to generate good responses (score of chosen response >=8)
87
+
88
+ Additionally, we did some "decontamination" of gsm8k prompts (very few that were present in the train split of gsm8k).
89
+
90
+ In code, using our new dataset this translates into:
91
 
92
  ```python
93
  from datasets import load_dataset
94
 
95
+ # Instead of this:
96
+ # dataset = load_dataset("Intel/orca_dpo_pairs", split="train")
97
+
98
+ # we did this
99
  dataset = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")
100
 
101
  dataset = dataset.filter(
 
105
  not r["in_gsm8k_train"]
106
  )
107
  ```
108
+ This resulted in `5,922` instead of `12,859` samples (54% reduction) and led to the following benchmark results.
109
+
110
  ## Benchmark results
111
+ For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and `score>5`).
112
+
113
+ For running the benchmark we used another awesome contribution from Maxime: [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), check it out!
114
+
115
+
116
  | Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
117
  |-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|----------:|-----------------:|
118
  | [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | **44.64** | **73.35** | 55.96 | 42.21 | **54.04** | 5,922 | **46%** |
119
  | [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) | 44.27 | 73.3 | **56.26** | **42.25** | 54.02 | 7,732 | 60% |
120
  | mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 | 12,859 | 100% |
121
+ | teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42| 0 (no DPO) | N/A |
122
+
123
+
124
+ ## Acknowledgements
125
+
126
+ We'd like to thank the amazing open community and in particular:
127
+
128
+ * The Intel team for publishing a great open dataset and show how well it worked in the first place
129
+ * Teknium and NousResearch for their awesome work and models.
130
+ * Maxime for sharing such great resources.
131
+