jondurbin
/

bagel-dpo-7b-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jondurbin commited on Dec 13, 2023

Commit

959435f

•

1 Parent(s): 5c083a1

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -12,6 +12,16 @@ This is the DPO'd version of https://huggingface.co/jondurbin/bagel-7b-v0.1
 If you are getting too many AALLM or other refusals, even with explicitly human system prompts, you may want to try the non-DPO version.
 ## Data selection.
 The first step in the process is creating a dataset.

 If you are getting too many AALLM or other refusals, even with explicitly human system prompts, you may want to try the non-DPO version.
+## Benchmarks
+I ran these against the latest main branch of lm-evaluation-harness (and opencompass/FastChat for agieval and mt-bench), since batch size/etc effects score for some benchmarks.
+| model | arc_challenge | boolq | gsm8k | hellaswag | mmlu | openbookqa | piqa | truthful_qa | winogrande |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| bagel | __0.6715__ | 0.8813 | __0.5618__ | 0.8397 | __0.6408__ | __0.51__ | __0.8406__ | __0.6275__ | __0.7561__ |
+| openhermes-2.5 | 0.6476 | __0.8835__ | 0.4852 | __0.8414__ | 0.6347 | 0.498 | 0.8400 | 0.5295 | 0.7443 |
 ## Data selection.
 The first step in the process is creating a dataset.