openchat
/

openchat-3.5-1210

@@ -44,48 +44,31 @@ pipeline_tag: text-generation
 </p>
 <hr>
-<div style="background-color: white; padding: 0.7em; border-radius: 0.5em; color: black; display: flex; flex-direction: column; justify-content: center; text-align: center;">
   <a href="https://huggingface.co/openchat/openchat_3.5" style="text-decoration: none; color: black;">
-    <span style="font-size: 0.7em; font-family: 'Helvetica'; color: white; background-color:white; border-radius: 6em; padding: 0.04em 0.4em; letter-spacing: 0.1em; font-weight: bold">3.51210</span>
-    <span style="font-size: 1.7em; font-family: 'Helvetica'; letter-spacing: 0.1em; font-weight: bold; color: black;">OPENCHAT</span><span style="font-size: 1.8em; font-family: 'Helvetica'; color: #3c72db;">3.5</span>
-    <span style="font-size: 0.7em; font-family: 'Helvetica'; color: white; background-color:red; border-radius: 6em; padding: 0.066em 0.4em; letter-spacing: 0.1em; font-weight: bold; vertical-align: top;">1210</span><br>
-    <span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;">
-      🏆 The Overall Best Performing Open Source 7B Model 🏆
-    </span>
-    <br> <span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;">🤖 Outperforms <span style="font-weight: bold;">ChatGPT</span> (March) and <span style="font-weight: bold;">Grok-1</span>  on most benchmarks 🤖</span>
-      <br> <span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;">🚀 <span style="font-size: 1em; font-family: 'Helvetica'; color: black; font-weight: bold;">15</span>-point improvement in Coding Performance over <span style="font-size: 0.9em;
-      font-family: 'Helvetica'; color: black; font-weight: bold;">OpenChat-3.5 🚀</span></span>
-      <br><span style="font-size: 2vw; font-family: 'Helvetica'; color: #3c72db; font-weight: bold; white-space: nowrap;">New Features</span>
-      <br> <span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;">💡 2 Modes: Coding + Generalist, Mathematical Reasoning 💡</span>
-      <br><span style="font-size: 2vw; font-family: 'Helvetica'; color: black; white-space: nowrap;"> 🧑‍⚖️ Experimental support for Evaluator and Feedback capabilities 🧑‍⚖️</span>
     </span>
   </a>
 </div>
-<!-- <a href="https://huggingface.co/openchat/openchat_3.5">
-  <button class="common-button">Model Repo</button>
-</a>
-<a href="https://openchat.team">
-  <button class="common-button">OpenChatUI Demo</button>
-</a>
-<a href="https://huggingface.co/spaces/openchat/openchat_3.5">
-  <button class="common-button">HuggingFace Space</button>
-</a>
-<a href="https://arxiv.org/pdf/2309.11235.pdf">
-  <button class="common-button">Paper</button>
-</a>
- -->
-</p>
 <div style="display: flex; justify-content: center; align-items: center">
-  <img src="https://github.com/alpayariyak/openchat/blob/master/assets/1210bench.png?raw=true" style="width: 100%; border-radius: 1em">">
 </div>
 <div>
 <h3> Table of Contents</h3>
 </div>
 1. [Usage](#usage)
 2. [Benchmarks](#benchmarks)
 3. [Limitations](#limitations)
@@ -212,6 +195,7 @@ Score 5: {orig_score5_description}
 | OpenOrca Mistral   | 7B       | 52.7     | 6.86         | 38.4            | 49.4     | 42.9     | 45.9          | 59.3         | 59.1         | 58.1        |
 | Zephyr-β^          | 7B       | 34.6     | 7.34         | 22.0            | 40.6     | 39.0     | 40.8          | 39.8         | 5.1          | 16.0        |
 | Mistral            | 7B       | -        | 6.84         | 30.5            | 39.0     | 38.0     | -             | 60.1         | 52.2         | -           |
 <details>
   <summary>Evaluation Details(click to expand)</summary>
 *: ChatGPT (March) results are from [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time.
@@ -226,7 +210,6 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
 <h3>HumanEval+</h3>
 </div>
 | Model                       | Size     | HumanEval+ pass@1 |
 |-----------------------------|----------|------------|
 | ChatGPT (December 12, 2023) | -        | 64.6       |
@@ -247,10 +230,15 @@ All models are evaluated in chat mode (e.g. with the respective conversation tem
 *: Grok results are reported by [X.AI](https://x.ai/).
 <div align="center">
@@ -270,7 +258,6 @@ OpenChat may sometimes generate information that does not exist or is not accura
 **Safety**
 OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
-## License
 <div align="center">
 <h2> License </h2>
 </div>
@@ -296,6 +283,7 @@ OpenChat 3.5 was trained with C-RLFT on a collection of publicly available high-
 <div align="center">
 <h2> Citation </h2>
 </div>
 ```
 @article{wang2023openchat,
   title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data},
@@ -313,4 +301,4 @@ We extend our heartfelt gratitude to AutoMeta and caesus from Alignment Lab AI,
 Special thanks go to Changling Liu from GPT Desk Pte. Ltd., Qiying Yu at Tsinghua University, Baochang Ma, and Hao Wan from 01.AI company for their generous provision of resources. We are also deeply grateful to Jianxiong Li and Peng Li at Tsinghua University for their insightful discussions.
-Furthermore, we appreciate the developers behind the following projects for their significant contributions to our research: [Mistral](https://mistral.ai/), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), [Llama 2](https://ai.meta.com/llama/), [Self-Instruct](https://arxiv.org/abs/2212.10560), [FastChat (Vicuna)](https://github.com/lm-sys/FastChat), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca.git), and [StarCoder](https://github.com/bigcode-project/starcoder). Their work has been instrumental in driving our research forward.

 </p>
 <hr>
+<div style="background-color: white; padding: 0.7em; border-radius: 0.5em; color: black; display: flex; flex-direction: column; justify-content: center; text-align: center; ont-size: 0.5em;">
   <a href="https://huggingface.co/openchat/openchat_3.5" style="text-decoration: none; color: black;">
+  <span style="font-size: 0.7em;  font-family: 'Helvetica'; color:  white; vertical-align: top;  background-color:white;  border-radius: 6em; padding: 0.04em 0.4em; letter-spacing: 0.1em; font-weight: bold">3.51210</span>
+    <span style="font-size: 1.7em; font-family: 'Helvetica'; letter-spacing: 0.1em; font-weight: bold; color: black;">OPENCHAT</span><span style="font-size: 1.8em; font-family: 'Helvetica'; color: #3c72db; ">3.5</span>
+        <span style="font-size: 0.7em;  font-family: 'Helvetica'; color:  white; vertical-align: top;  background-color:red;  border-radius: 6em; padding: 0.066em 0.4em; letter-spacing: 0.1em; font-weight: bold;">1210</span>
+    <span style="font-size: 1em; font-family: 'Helvetica'; color: black;">
+      <br> 🏆 The Overall Best Performing Open Source 7B Model 🏆
+    <br> 🤖 Outperforms <span style="font-weight: bold;">ChatGPT</span> (March) and <span style="font-weight: bold;">Grok-1</span>  on most benchmarks 🤖
+      <br> 🚀<span style="font-size: 1em; font-family: 'Helvetica'; color: black; font-weight: bold;">15</span>-point improvement in Coding Performance over <span style="font-size: 0.9em;
+      font-family: 'Helvetica'; color: black; font-weight: bold;">OpenChat-3.5🚀</span>
+      <br><br><span style="font-size: 1em; font-family: 'Helvetica'; color: #3c72db; font-weight: bold;">New Features</span>
+      <br> 💡 2 Modes: Coding + Generalist, Mathematical Reasoning 💡
+      <br> 🧑‍⚖️ Experimental support for Evaluator and Feedback capabilities 🧑‍⚖️
     </span>
   </a>
 </div>
 <div style="display: flex; justify-content: center; align-items: center">
+  <img src="https://github.com/alpayariyak/openchat/blob/master/assets/1210bench.png?raw=true" style="width: 100%; border-radius: 1em">
 </div>
 <div>
 <h3> Table of Contents</h3>
 </div>
 1. [Usage](#usage)
 2. [Benchmarks](#benchmarks)
 3. [Limitations](#limitations)
 | OpenOrca Mistral   | 7B       | 52.7     | 6.86         | 38.4            | 49.4     | 42.9     | 45.9          | 59.3         | 59.1         | 58.1        |
 | Zephyr-β^          | 7B       | 34.6     | 7.34         | 22.0            | 40.6     | 39.0     | 40.8          | 39.8         | 5.1          | 16.0        |
 | Mistral            | 7B       | -        | 6.84         | 30.5            | 39.0     | 38.0     | -             | 60.1         | 52.2         | -           |
 <details>
   <summary>Evaluation Details(click to expand)</summary>
 *: ChatGPT (March) results are from [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time.
 <h3>HumanEval+</h3>
 </div>
 | Model                       | Size     | HumanEval+ pass@1 |
 |-----------------------------|----------|------------|
 | ChatGPT (December 12, 2023) | -        | 64.6       |
 *: Grok results are reported by [X.AI](https://x.ai/).
+<div>
+<h3>Massive Multitask Language Understanding in Chinese (CMMLU)</h3>
+5-shot:
+</div>
+| Models   | STEM  | Humanities | SocialSciences | Other | ChinaSpecific | Avg   |
+|----------|-------|------------|----------------|-------|---------------|-------|
+| ChatGPT  | 47.81 | 55.68      | 56.5           | 62.66 | 50.69         | 55.51 |
+| OpenChat | 38.7  | 45.99      | 48.32          | 50.23 | 43.27         | 45.85 |
 <div align="center">
 **Safety**
 OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
 <div align="center">
 <h2> License </h2>
 </div>
 <div align="center">
 <h2> Citation </h2>
 </div>
 ```
 @article{wang2023openchat,
   title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data},
 Special thanks go to Changling Liu from GPT Desk Pte. Ltd., Qiying Yu at Tsinghua University, Baochang Ma, and Hao Wan from 01.AI company for their generous provision of resources. We are also deeply grateful to Jianxiong Li and Peng Li at Tsinghua University for their insightful discussions.
+Furthermore, we appreciate the developers behind the following projects for their significant contributions to our research: [Mistral](https://mistral.ai/), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), [Llama 2](https://ai.meta.com/llama/), [Self-Instruct](https://arxiv.org/abs/2212.10560), [FastChat (Vicuna)](https://github.com/lm-sys/FastChat), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca.git), and [StarCoder](https://github.com/bigcode-project/starcoder). Their work has been instrumental in driving our research forward.