Commit
•
adb769e
1
Parent(s):
a93afef
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ tags:
|
|
12 |
---
|
13 |
# ⚗️ distilabeled OpenHermes 2.5 Mistral 7B
|
14 |
|
15 |
-
> A
|
16 |
|
17 |
<div>
|
18 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/yWdvBtKKfJdpdnPiSlNb9.png">
|
@@ -110,7 +110,7 @@ dataset = dataset.filter(
|
|
110 |
not r["in_gsm8k_train"]
|
111 |
)
|
112 |
```
|
113 |
-
This resulted in `5,922` instead of `12,859` samples (54% reduction) and
|
114 |
|
115 |
## Benchmark results
|
116 |
For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and `score>5`).
|
@@ -118,21 +118,21 @@ For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find
|
|
118 |
For running the benchmark we used another awesome contribution from Maxime: [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), check it out!
|
119 |
|
120 |
|
121 |
-
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|
122 |
-
|
123 |
-
| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | **44.64** | **73.35** | 55.96 | 42.21 | **54.04** |
|
124 |
-
| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) | 44.27 | 73.3 | **56.26** | **42.25** | 54.02 |
|
125 |
-
| mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 |
|
126 |
-
| teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42|
|
127 |
|
128 |
> Update: we now include llm-harness results too!
|
129 |
|
130 |
-
| Model | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
131 |
-
|
132 |
-
| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | 66.04 | **85.07** | Pending | 55.96 | **79.56** | **66.34** |
|
133 |
-
| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) | 65.36 | 84.74 | Pending | **56.26** | 79.24 | 65.13 |
|
134 |
-
| [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | **66.55** | 84.90 | **63.32** | 54.93 | 78.30 | 61.30 |
|
135 |
-
| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 64.93 | 84.18 | 63.64 | 52.24 | 78.06 | 26.08 |
|
136 |
|
137 |
### Training Hardware
|
138 |
|
|
|
12 |
---
|
13 |
# ⚗️ distilabeled OpenHermes 2.5 Mistral 7B
|
14 |
|
15 |
+
> A Neural DPO of OpenHermes 2.5, high quality matters for DPO!
|
16 |
|
17 |
<div>
|
18 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/yWdvBtKKfJdpdnPiSlNb9.png">
|
|
|
110 |
not r["in_gsm8k_train"]
|
111 |
)
|
112 |
```
|
113 |
+
This resulted in `5,922` instead of `12,859` samples (54% reduction) and we run it for 200 steps (using around ~3.2K samples).
|
114 |
|
115 |
## Benchmark results
|
116 |
For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and `score>5`).
|
|
|
118 |
For running the benchmark we used another awesome contribution from Maxime: [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), check it out!
|
119 |
|
120 |
|
121 |
+
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|
122 |
+
|-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|
|
123 |
+
| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | **44.64** | **73.35** | 55.96 | 42.21 | **54.04** |
|
124 |
+
| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) | 44.27 | 73.3 | **56.26** | **42.25** | 54.02 |
|
125 |
+
| mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 |
|
126 |
+
| teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42|
|
127 |
|
128 |
> Update: we now include llm-harness results too!
|
129 |
|
130 |
+
| Model | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
131 |
+
|------------------------------------------------------|-------|-----------|------|-----------:|------------|-------|
|
132 |
+
| [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | 66.04 | **85.07** | Pending | 55.96 | **79.56** | **66.34** |
|
133 |
+
| [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) | 65.36 | 84.74 | Pending | **56.26** | 79.24 | 65.13 |
|
134 |
+
| [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | **66.55** | 84.90 | **63.32** | 54.93 | 78.30 | 61.30 |
|
135 |
+
| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 64.93 | 84.18 | 63.64 | 52.24 | 78.06 | 26.08 |
|
136 |
|
137 |
### Training Hardware
|
138 |
|