JosephusCheung commited on
Commit
97ca638
·
1 Parent(s): 5751faf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -101,6 +101,18 @@ Hard ACC:54.71
101
 
102
  Win rate **88.26%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
103
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ## Other languages
105
  We are currently unable to produce accurate benchmark templates for non-QA tasks (languages other than English and Chinese). However, we will be working on other language versions of the QA-Task challenge in the near future.
106
  ### Japanese Benchmark
 
101
 
102
  Win rate **88.26%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
103
 
104
+ ## MT-Behch on DPO Version
105
+ | Model | MT-Bench |
106
+ | ------------------------- | ------------ |
107
+ | GPT-4 | 8.99 |
108
+ | GPT-3.5-Turbo | 7.94 |
109
+ | | |
110
+ | Zephyr-7b-β (Overfitting) | 7.34 |
111
+ | Zephyr-7b-α | 6.88 |
112
+ | | |
113
+ | **[CausalLM/14B-DPO-α](https://huggingface.co/CausalLM/14B-DPO-alpha)** | **7.618868** |
114
+ | **[CausalLM/7B-DPO-α](https://huggingface.co/CausalLM/7B-DPO-alpha)** | **7.038125** |
115
+
116
  ## Other languages
117
  We are currently unable to produce accurate benchmark templates for non-QA tasks (languages other than English and Chinese). However, we will be working on other language versions of the QA-Task challenge in the near future.
118
  ### Japanese Benchmark