alpayariyak commited on
Commit
fac2446
β€’
1 Parent(s): d975774

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -36
README.md CHANGED
@@ -44,48 +44,31 @@ pipeline_tag: text-generation
44
  </p>
45
 
46
  <hr>
47
- <div style="background-color: white; padding: 0.7em; border-radius: 0.5em; color: black; display: flex; flex-direction: column; justify-content: center; text-align: center;">
48
  <a href="https://huggingface.co/openchat/openchat_3.5" style="text-decoration: none; color: black;">
49
- <span style="font-size: 0.7em; font-family: 'Helvetica'; color: white; background-color:white; border-radius: 6em; padding: 0.04em 0.4em; letter-spacing: 0.1em; font-weight: bold">3.51210</span>
50
- <span style="font-size: 1.7em; font-family: 'Helvetica'; letter-spacing: 0.1em; font-weight: bold; color: black;">OPENCHAT</span><span style="font-size: 1.8em; font-family: 'Helvetica'; color: #3c72db;">3.5</span>
51
- <span style="font-size: 0.7em; font-family: 'Helvetica'; color: white; background-color:red; border-radius: 6em; padding: 0.066em 0.4em; letter-spacing: 0.1em; font-weight: bold; vertical-align: top;">1210</span><br>
52
- <span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;">
53
- πŸ† The Overall Best Performing Open Source 7B Model πŸ†
54
- </span>
55
- <br> <span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;">πŸ€– Outperforms <span style="font-weight: bold;">ChatGPT</span> (March) and <span style="font-weight: bold;">Grok-1</span> on most benchmarks πŸ€–</span>
56
- <br> <span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;">πŸš€ <span style="font-size: 1em; font-family: 'Helvetica'; color: black; font-weight: bold;">15</span>-point improvement in Coding Performance over <span style="font-size: 0.9em;
57
- font-family: 'Helvetica'; color: black; font-weight: bold;">OpenChat-3.5 πŸš€</span></span>
58
- <br><span style="font-size: 2vw; font-family: 'Helvetica'; color: #3c72db; font-weight: bold; white-space: nowrap;">New Features</span>
59
- <br> <span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;">πŸ’‘ 2 Modes: Coding + Generalist, Mathematical Reasoning πŸ’‘</span>
60
- <br><span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;"> πŸ§‘β€βš–οΈ Experimental support for Evaluator and Feedback capabilities πŸ§‘β€βš–οΈ</span>
61
  </span>
62
  </a>
63
  </div>
64
 
65
- <!-- <a href="https://huggingface.co/openchat/openchat_3.5">
66
- <button class="common-button">Model Repo</button>
67
- </a>
68
- <a href="https://openchat.team">
69
- <button class="common-button">OpenChatUI Demo</button>
70
- </a>
71
- <a href="https://huggingface.co/spaces/openchat/openchat_3.5">
72
- <button class="common-button">HuggingFace Space</button>
73
- </a>
74
- <a href="https://arxiv.org/pdf/2309.11235.pdf">
75
- <button class="common-button">Paper</button>
76
- </a>
77
- -->
78
- </p>
79
-
80
  <div style="display: flex; justify-content: center; align-items: center">
81
- <img src="https://github.com/alpayariyak/openchat/blob/master/assets/1210bench.png?raw=true" style="width: 100%; border-radius: 1em">">
82
  </div>
83
 
84
  <div>
85
  <h3> Table of Contents</h3>
86
  </div>
87
 
88
-
89
  1. [Usage](#usage)
90
  2. [Benchmarks](#benchmarks)
91
  3. [Limitations](#limitations)
@@ -212,6 +195,7 @@ Score 5: {orig_score5_description}
212
  | OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 |
213
  | Zephyr-Ξ²^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 |
214
  | Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - |
 
215
  <details>
216
  <summary>Evaluation Details(click to expand)</summary>
217
  *: ChatGPT (March) results are from [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time.
@@ -226,7 +210,6 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
226
  <h3>HumanEval+</h3>
227
  </div>
228
 
229
-
230
  | Model | Size | HumanEval+ pass@1 |
231
  |-----------------------------|----------|------------|
232
  | ChatGPT (December 12, 2023) | - | 64.6 |
@@ -247,10 +230,15 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
247
 
248
  *: Grok results are reported by [X.AI](https://x.ai/).
249
 
 
 
 
 
250
 
251
-
252
-
253
-
 
254
 
255
 
256
  <div align="center">
@@ -270,7 +258,6 @@ OpenChat may sometimes generate information that does not exist or is not accura
270
  **Safety**
271
  OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
272
 
273
- ## License
274
  <div align="center">
275
  <h2> License </h2>
276
  </div>
@@ -296,6 +283,7 @@ OpenChat 3.5 was trained with C-RLFT on a collection of publicly available high-
296
  <div align="center">
297
  <h2> Citation </h2>
298
  </div>
 
299
  ```
300
  @article{wang2023openchat,
301
  title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data},
@@ -313,4 +301,4 @@ We extend our heartfelt gratitude to AutoMeta and caesus from Alignment Lab AI,
313
 
314
  Special thanks go to Changling Liu from GPT Desk Pte. Ltd., Qiying Yu at Tsinghua University, Baochang Ma, and Hao Wan from 01.AI company for their generous provision of resources. We are also deeply grateful to Jianxiong Li and Peng Li at Tsinghua University for their insightful discussions.
315
 
316
- Furthermore, we appreciate the developers behind the following projects for their significant contributions to our research: [Mistral](https://mistral.ai/), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), [Llama 2](https://ai.meta.com/llama/), [Self-Instruct](https://arxiv.org/abs/2212.10560), [FastChat (Vicuna)](https://github.com/lm-sys/FastChat), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca.git), and [StarCoder](https://github.com/bigcode-project/starcoder). Their work has been instrumental in driving our research forward.
 
44
  </p>
45
 
46
  <hr>
47
+ <div style="background-color: white; padding: 0.7em; border-radius: 0.5em; color: black; display: flex; flex-direction: column; justify-content: center; text-align: center; ont-size: 0.5em;">
48
  <a href="https://huggingface.co/openchat/openchat_3.5" style="text-decoration: none; color: black;">
49
+ <span style="font-size: 0.7em; font-family: 'Helvetica'; color: white; vertical-align: top; background-color:white; border-radius: 6em; padding: 0.04em 0.4em; letter-spacing: 0.1em; font-weight: bold">3.51210</span>
50
+ <span style="font-size: 1.7em; font-family: 'Helvetica'; letter-spacing: 0.1em; font-weight: bold; color: black;">OPENCHAT</span><span style="font-size: 1.8em; font-family: 'Helvetica'; color: #3c72db; ">3.5</span>
51
+ <span style="font-size: 0.7em; font-family: 'Helvetica'; color: white; vertical-align: top; background-color:red; border-radius: 6em; padding: 0.066em 0.4em; letter-spacing: 0.1em; font-weight: bold;">1210</span>
52
+ <span style="font-size: 1em; font-family: 'Helvetica'; color: black;">
53
+ <br> πŸ† The Overall Best Performing Open Source 7B Model πŸ†
54
+ <br> πŸ€– Outperforms <span style="font-weight: bold;">ChatGPT</span> (March) and <span style="font-weight: bold;">Grok-1</span> on most benchmarks πŸ€–
55
+ <br> πŸš€<span style="font-size: 1em; font-family: 'Helvetica'; color: black; font-weight: bold;">15</span>-point improvement in Coding Performance over <span style="font-size: 0.9em;
56
+ font-family: 'Helvetica'; color: black; font-weight: bold;">OpenChat-3.5πŸš€</span>
57
+ <br><br><span style="font-size: 1em; font-family: 'Helvetica'; color: #3c72db; font-weight: bold;">New Features</span>
58
+ <br> πŸ’‘ 2 Modes: Coding + Generalist, Mathematical Reasoning πŸ’‘
59
+ <br> πŸ§‘β€βš–οΈ Experimental support for Evaluator and Feedback capabilities πŸ§‘β€βš–οΈ
 
60
  </span>
61
  </a>
62
  </div>
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  <div style="display: flex; justify-content: center; align-items: center">
65
+ <img src="https://github.com/alpayariyak/openchat/blob/master/assets/1210bench.png?raw=true" style="width: 100%; border-radius: 1em">
66
  </div>
67
 
68
  <div>
69
  <h3> Table of Contents</h3>
70
  </div>
71
 
 
72
  1. [Usage](#usage)
73
  2. [Benchmarks](#benchmarks)
74
  3. [Limitations](#limitations)
 
195
  | OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 |
196
  | Zephyr-Ξ²^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 |
197
  | Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - |
198
+
199
  <details>
200
  <summary>Evaluation Details(click to expand)</summary>
201
  *: ChatGPT (March) results are from [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time.
 
210
  <h3>HumanEval+</h3>
211
  </div>
212
 
 
213
  | Model | Size | HumanEval+ pass@1 |
214
  |-----------------------------|----------|------------|
215
  | ChatGPT (December 12, 2023) | - | 64.6 |
 
230
 
231
  *: Grok results are reported by [X.AI](https://x.ai/).
232
 
233
+ <div>
234
+ <h3>Massive Multitask Language Understanding in Chinese (CMMLU)</h3>
235
+ 5-shot:
236
+ </div>
237
 
238
+ | Models | STEM | Humanities | SocialSciences | Other | ChinaSpecific | Avg |
239
+ |----------|-------|------------|----------------|-------|---------------|-------|
240
+ | ChatGPT | 47.81 | 55.68 | 56.5 | 62.66 | 50.69 | 55.51 |
241
+ | OpenChat | 38.7 | 45.99 | 48.32 | 50.23 | 43.27 | 45.85 |
242
 
243
 
244
  <div align="center">
 
258
  **Safety**
259
  OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
260
 
 
261
  <div align="center">
262
  <h2> License </h2>
263
  </div>
 
283
  <div align="center">
284
  <h2> Citation </h2>
285
  </div>
286
+
287
  ```
288
  @article{wang2023openchat,
289
  title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data},
 
301
 
302
  Special thanks go to Changling Liu from GPT Desk Pte. Ltd., Qiying Yu at Tsinghua University, Baochang Ma, and Hao Wan from 01.AI company for their generous provision of resources. We are also deeply grateful to Jianxiong Li and Peng Li at Tsinghua University for their insightful discussions.
303
 
304
+ Furthermore, we appreciate the developers behind the following projects for their significant contributions to our research: [Mistral](https://mistral.ai/), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), [Llama 2](https://ai.meta.com/llama/), [Self-Instruct](https://arxiv.org/abs/2212.10560), [FastChat (Vicuna)](https://github.com/lm-sys/FastChat), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca.git), and [StarCoder](https://github.com/bigcode-project/starcoder). Their work has been instrumental in driving our research forward.