jondurbin commited on
Commit
959435f
1 Parent(s): 5c083a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -12,6 +12,16 @@ This is the DPO'd version of https://huggingface.co/jondurbin/bagel-7b-v0.1
12
 
13
  If you are getting too many AALLM or other refusals, even with explicitly human system prompts, you may want to try the non-DPO version.
14
 
 
 
 
 
 
 
 
 
 
 
15
  ## Data selection.
16
 
17
  The first step in the process is creating a dataset.
 
12
 
13
  If you are getting too many AALLM or other refusals, even with explicitly human system prompts, you may want to try the non-DPO version.
14
 
15
+ ## Benchmarks
16
+
17
+ I ran these against the latest main branch of lm-evaluation-harness (and opencompass/FastChat for agieval and mt-bench), since batch size/etc effects score for some benchmarks.
18
+
19
+ | model | arc_challenge | boolq | gsm8k | hellaswag | mmlu | openbookqa | piqa | truthful_qa | winogrande |
20
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
21
+ | bagel | __0.6715__ | 0.8813 | __0.5618__ | 0.8397 | __0.6408__ | __0.51__ | __0.8406__ | __0.6275__ | __0.7561__ |
22
+ | openhermes-2.5 | 0.6476 | __0.8835__ | 0.4852 | __0.8414__ | 0.6347 | 0.498 | 0.8400 | 0.5295 | 0.7443 |
23
+
24
+
25
  ## Data selection.
26
 
27
  The first step in the process is creating a dataset.