Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,16 @@ This is the DPO'd version of https://huggingface.co/jondurbin/bagel-7b-v0.1
|
|
12 |
|
13 |
If you are getting too many AALLM or other refusals, even with explicitly human system prompts, you may want to try the non-DPO version.
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
## Data selection.
|
16 |
|
17 |
The first step in the process is creating a dataset.
|
|
|
12 |
|
13 |
If you are getting too many AALLM or other refusals, even with explicitly human system prompts, you may want to try the non-DPO version.
|
14 |
|
15 |
+
## Benchmarks
|
16 |
+
|
17 |
+
I ran these against the latest main branch of lm-evaluation-harness (and opencompass/FastChat for agieval and mt-bench), since batch size/etc effects score for some benchmarks.
|
18 |
+
|
19 |
+
| model | arc_challenge | boolq | gsm8k | hellaswag | mmlu | openbookqa | piqa | truthful_qa | winogrande |
|
20 |
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
21 |
+
| bagel | __0.6715__ | 0.8813 | __0.5618__ | 0.8397 | __0.6408__ | __0.51__ | __0.8406__ | __0.6275__ | __0.7561__ |
|
22 |
+
| openhermes-2.5 | 0.6476 | __0.8835__ | 0.4852 | __0.8414__ | 0.6347 | 0.498 | 0.8400 | 0.5295 | 0.7443 |
|
23 |
+
|
24 |
+
|
25 |
## Data selection.
|
26 |
|
27 |
The first step in the process is creating a dataset.
|