LoneStriker
/

SUS-Chat-34B-6.0bpw-h6-exl2

@@ -7,16 +7,21 @@ widget:
     text: hi
     output:
       text: ' Hello! How can I assist you today?'
 pipeline_tag: text-generation
 ---
-# 🐗SUS-Chat: Instruction tuning done right
 <div align="center">
 <p align="center">
-<img width="200px" src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/sustech.svg?sanitize=true">
 </p>
 <div style="display: inline-block;">
@@ -38,7 +43,7 @@ pipeline_tag: text-generation
 <div style="display: inline-block;">
 <a rel="noopener nofollow" href="https://www.modelscope.cn/organization/sustc/">
-<img src="https://img.shields.io/badge/ModelScope-sustec-blue" style="margin: 0 0;">
 </a>
 </div>
@@ -53,7 +58,7 @@ pipeline_tag: text-generation
 <div style="display: inline-block;">
-<a rel="noopener nofollow" href="https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/MODEL_LICENSE_AGREEMENT.txt">
 <img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
 </a>
@@ -69,54 +74,263 @@ pipeline_tag: text-generation
 </div>
-# Inrtoduction
-<img src="https://hackmd.io/_uploads/S1dXCTIHp.png" id="fig-sus"
-alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
-**SUS-Chat**
-是一个34B的中英文对话模型，由南方科技大学和粤港澳大湾区数字经济研究院联合发布。SUS-Chat-34B模型在数百万高质、多语言的指令数据上进行了微调，在保持基础模型强大的语言能力的同时，SUS-Chat-34B模型通过高质量指令微调改善了模型对人类指令的响应方式并擅长通过思维链的方式模仿人类思考过程。
-它在几乎所有基准测试中超过了所有同尺寸的模型，而且能够更好地满足了复杂多语言任务的实际需求，相比于更大的模型，SUS-Chat-34B仍具有相当竞争力，在我们的综合评测中取得了最先进的表现。
-SUS-Chat有力地证明了通过正确的指令微调，学术机构可以在不增加模型参数的情况下，通过开源的数据集和模型，获得更好的性能,
-这弥合了学术界和工业界的在大语言模型上的差距，为学术界和工业界的合作提供了新的可能性。
 # Performance
-为了更好地评估SUS-Chat-34B模型的性能，我们在多个基准测试中进行了评估，并开源了评估框架[TLEM](https://huggingface.co/spaces/SUSTech/tlem)，以便于其他研究人员进行复现和比较。
-在TLEM中，我们使用了多个基准测试，包括：MMLU, CMMLU, C-Eval, BBH,
-GSM-8K, MATH,
-专注于衡量模型的知识和思维能力，在这些指标中SUS-Chat-34B模型取得了最先进的表现，我们还额外引入了[lm-eval](https://github.com/EleutherAI/lm-evaluation-harness)测试了SUS-Chat和同类模型在winogrande,
-hellaswag, arc, truthful-qa的表现, 衡量模型的常识性推理能力和幻觉。
-综合上看，SUS-Chat-34B模型显著领先于同规模的模型，并取得了最先进的综合性能。
-| model             | mmlu-chat | cmmlu-chat | ceval-chat | gsm8k |   BBH |  MATH | winogrande |   arc | hellaswag | truthfulqa | average |
-|:------------------|----------:|-----------:|-----------:|------:|------:|------:|-----------:|------:|----------:|-----------:|--------:|
-| GPT-4             |        83 |         71 |       69.9 |  91.4 |  86.7 |  45.8 |       87.5 |  94.5 |      91.4 |        nan | 80.1333 |
-| SUS-Chat-34B      |     77.35 |      78.68 |      82.42 | 80.06 | 67.62 |  28.8 |      81.22 | 81.54 |     83.79 |      57.47 |  71.895 |
-| Qwen-72B-Chat     |     74.52 |      77.02 |      77.22 | 76.57 | 72.63 |  35.9 |      80.58 | 81.29 |     87.02 |      50.64 |  71.339 |
-| DeepSeek-67B-Chat |     69.43 |      48.51 |       59.7 | 74.45 | 69.73 | 29.56 |      76.09 |  82.1 |     86.06 |      56.37 |    65.2 |
-| OrionStar-34B     |     68.51 |      66.88 |      65.13 | 54.36 | 62.88 |  12.8 |      77.27 | 80.19 |     84.54 |      53.24 |   62.58 |
-| Yi-34B-Chat       |     66.96 |      55.16 |      77.16 | 63.76 | 61.54 | 10.02 |      76.64 | 70.66 |     82.29 |      54.57 |  61.876 |
-<img src="assets/radar.png" id="fig-bench" alt="Figure 2: Benchmark" />
-# 用法
-SUS-Chat-34B是标准的LLaMA模型，使用方法和开发环境与大多数其它开源模型相同，可以通过以下方式进行多轮对话
-``` python
-from transformers import AutoModelForCausalLM, AutoTokenizer
 def chat_template(messages):
     history = ""
     for message in messages:
         match message:
-            case {"role": "human", "content": message}:
                 history += f"### Human: {message}\n\n### Assistant: "
             case {"role": "assistant", "content": message}:
                 history += message
@@ -132,10 +346,12 @@ model = AutoModelForCausalLM.from_pretrained(
 messages = [{"role": "user", "content": "hi"}]
-input_ids = tokenizer.encode(chat_template(messages), return_tensors="pt").to("cuda")
-output_ids = model.generate(input_ids.to("cuda"))
 response = tokenizer.decode(
-    output_ids[0][input_ids.shape[1] :], skip_special_tokens=True
 )
 messages.append({"role": "assistant", "content": response})
@@ -144,25 +360,42 @@ messages.append({"role": "assistant", "content": response})
 messages.append({"role": "user", "content": "What is the capital of China?"})
-input_ids = tokenizer.encode(chat_template(messages), return_tensors="pt").to("cuda")
-output_ids = model.generate(input_ids.to("cuda"))
 response = tokenizer.decode(
-    output_ids[0][input_ids.shape[1] :], skip_special_tokens=True
 )
 messages.append({"role": "assistant", "content": response})
 ```
-# 限制
-SUS-Chat只进行了监督微调，尚未进行人类偏好学习，因此在一些情况下可能会产生不合理的回复，并放大某些语言模型现有的问题,
-包括幻觉、非确定性和累积误差,
-为了实现更有利于下游任务的性能，我们建议相应地调整生成是配置参数。
-# 免责声明
-我们在训练过程中使用数据合规检查算法，尽力确保训练模型的合规性。由于数据复杂且语言模型使用场景多样，我们无法保证模型在所有情况下生成正确和合理的输出。请注意，模型仍然存在产生问题输出的风险。对于因滥用、误导、非法使用和相关错误信息以及相关数据安全问题而导致的任何风险和问题，我们将不承担责任。
-# 许可
-该模型完全开发供学术研究和免费商业使用，但需要遵守来自零一万物的[许可](https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/MODEL_LICENSE_AGREEMENT.txt)

     text: hi
     output:
       text: ' Hello! How can I assist you today?'
 pipeline_tag: text-generation
 ---
+# 🐷SUS-Chat: Instruction tuning done right
+<p align="left">
+<a href="README_CN.md">中文</a>&nbsp ｜ &nbspEnglish&nbsp
+</p>
+<br><br>
 <div align="center">
 <p align="center">
+<img src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/sustech.svg?sanitize=true" width="200px">
+<img src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/ccnl.png?sanitize=true" width="200px">
 </p>
 <div style="display: inline-block;">
 <div style="display: inline-block;">
 <a rel="noopener nofollow" href="https://www.modelscope.cn/organization/sustc/">
+<img src="https://img.shields.io/badge/🤖ModelScope-sustc-blue" style="margin: 0 0;">
 </a>
 </div>
 <div style="display: inline-block;">
+<a rel="noopener nofollow" href="https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt">
 <img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
 </a>
 </div>
+# News
+- 2023-12-06: Try [SUS-Chat-34B
+  chat-ui](https://huggingface.co/spaces/SUSTech/SUS-Chat-34B).
+- 2023-12-05: SUS-Chat-34B is now available on
+  [ModelScope🤖](https://www.modelscope.cn/models/SUSTC/SUS-Chat-34B/summary)
+- 2023-12-05: SUS-Chat-34B is ranked 2nd in [Open LLM
+  leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+  and surpassed all models under 70B.
+- 2023-12-01: SUS-Chat-34B is now available on
+  [HuggingFace🤗](https://huggingface.co/SUSTech/SUS-Chat-34B).
+# Introduction
+<img src="https://hackmd.io/_uploads/HJlDtzhBa.png" id="fig-sus"
+alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
+**SUS-Chat-34B** is a 34B bilingual Chinese-English dialogue model,
+jointly released by the **[Southern University of Science and
+Technology](https://huggingface.co/SUSTech)** and
+**[IDEA-CCNL](https://huggingface.co/IDEA-CCNL)**. This model is based
+on [`01-ai/Yi-34B`](https://huggingface.co/01-ai/Yi-34B) and has been
+fine-tuned on millions of high-quality, multilingual instruction data.
+While maintaining the strong language capabilities of the base model,
+the SUS-Chat-34B model has improved the model’s response to human
+instructions through high-quality instruction fine-tuning and excels at
+imitating human thought processes through chains of thought. It
+introduces inter-instruction attention sharing in long texts, expanding
+the window size from 4K to 8K, significantly enhancing the usability of
+multi-turn dialogues.
+It has surpassed all models of the same size in almost all benchmark
+tests and is better suited to meet the practical needs of complex
+multilingual tasks. Compared to larger models, SUS-Chat-34B remains
+highly competitive and has achieved state-of-the-art performance in our
+comprehensive evaluations.
+SUS-Chat-34B model has the following highlights:
+1.  Large-scale complex instruction following data: Trained with 1.4
+    billion tokens of high-quality complex instruction data, covering
+    Chinese and English, multi-turn dialogues, mathematics, reasoning,
+    and various other types of instruction data;
+2.  Strong performance in general tasks: The SUS-Chat-34B model excels
+    in numerous mainstream Chinese and English tasks, surpassing other
+    open-source instruction fine-tuned models of the same parameter
+    scale. It also competes well against models with larger parameter
+    scales;
+3.  Longer context window and excellent multi-turn dialogue
+    capabilities: Currently, SUS-Chat-34B supports an 8K context window,
+    and is trained with a large amount of multi-turn instruction and
+    single-multi-turn mixed data, demonstrating remarkable capabilities
+    in long-text dialogue information focus and instruction follow-up.
+SUS-Chat powerfully demonstrates that through the right instruction
+fine-tuning, academic institutions can achieve better performance
+without increasing model parameters, using open-source datasets and
+models. This bridges the gap between academia and industry in large
+language models and opens new possibilities for collaboration between
+academic and industrial sectors.
 # Performance
+To better evaluate the performance of the SUS-Chat-34B model, we
+conducted assessments across multiple benchmark tests and have
+open-sourced the evaluation framework
+[TLEM](https://huggingface.co/spaces/SUSTech/tlem) to facilitate
+replication and comparison by other researchers.
+In TLEM, we utilized various benchmark tests including MMLU, CMMLU,
+C-Eval, BBH, GSM-8K, and MATH, to measure the model’s knowledge and
+thinking capabilities. In these metrics, the SUS-Chat-34B model achieved
+state-of-the-art performance. Additionally, we incorporated
+[lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) to test
+SUS-Chat and similar models on winogrande, hellaswag, arc, and
+truthful-qa, assessing the model’s common-sense reasoning ability and
+susceptibility to illusions.
+Overall, the SUS-Chat-34B model significantly outperformed models of
+similar scale and achieved the most advanced comprehensive performance.
+<img
+src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/radar.png"
+id="fig-bench" alt="Figure 2: Benchmark" />
+<div>
+<table>
+<colgroup>
+<col style="width: 50%" />
+<col style="width: 50%" />
+</colgroup>
+<tbody>
+<tr class="odd">
+<td style="text-align: center;"><div width="50.0%"
+data-layout-align="center">
+<h2 id="english-understanding">English Understanding</h2>
+<table>
+<thead>
+<tr class="header">
+<th style="text-align: right;">Model</th>
+<th style="text-align: center;">mmlu (0-shot)</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td style="text-align: right;">GPT-4</td>
+<td style="text-align: center;">83</td>
+</tr>
+<tr class="even">
+<td style="text-align: right;">SUS-Chat-34B</td>
+<td style="text-align: center;"><u>74.35</u></td>
+</tr>
+<tr class="odd">
+<td style="text-align: right;">Qwen-72b-Chat</td>
+<td style="text-align: center;"><strong>74.52</strong></td>
+</tr>
+<tr class="even">
+<td style="text-align: right;">Deepseek-68b-Chat</td>
+<td style="text-align: center;">69.43</td>
+</tr>
+<tr class="odd">
+<td style="text-align: right;">OrionStar-Yi-34B-Chat</td>
+<td style="text-align: center;">68.51</td>
+</tr>
+<tr class="even">
+<td style="text-align: right;">Yi-34B-Chat</td>
+<td style="text-align: center;">66.96</td>
+</tr>
+</tbody>
+</table>
+</div></td>
+<td style="text-align: center;"><div width="50.0%"
+data-layout-align="center">
+<h2 id="chinese-capabilities">Chinese Capabilities</h2>
+<table>
+<colgroup>
+<col style="width: 34%" />
+<col style="width: 32%" />
+<col style="width: 32%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th style="text-align: right;">Model</th>
+<th style="text-align: center;">cmmlu (0-shot)</th>
+<th style="text-align: center;">C-Eval (0-shot)<a href="#fn1"
+class="footnote-ref" id="fnref1"
+role="doc-noteref"><sup>1</sup></a></th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td style="text-align: right;">GPT-4</td>
+<td style="text-align: center;">71</td>
+<td style="text-align: center;">69.9</td>
+</tr>
+<tr class="even">
+<td style="text-align: right;">SUS-Chat-34B</td>
+<td style="text-align: center;"><strong>78.68</strong></td>
+<td style="text-align: center;"><strong>82.42</strong></td>
+</tr>
+<tr class="odd">
+<td style="text-align: right;">Qwen-72b-Chat</td>
+<td style="text-align: center;"><u>77.02</u></td>
+<td style="text-align: center;"><u>77.22</u></td>
+</tr>
+<tr class="even">
+<td style="text-align: right;">Deepseek-68b-Chat</td>
+<td style="text-align: center;">48.51</td>
+<td style="text-align: center;">59.7</td>
+</tr>
+<tr class="odd">
+<td style="text-align: right;">OrionStar-Yi-34B-Chat</td>
+<td style="text-align: center;">66.88</td>
+<td style="text-align: center;">65.13</td>
+</tr>
+<tr class="even">
+<td style="text-align: right;">Yi-34B-Chat</td>
+<td style="text-align: center;">55.16</td>
+<td style="text-align: center;">77.16</td>
+</tr>
+</tbody>
+</table>
+</div></td>
+</tr>
+</tbody>
+</table>
+<section id="footnotes" class="footnotes footnotes-end-of-document"
+role="doc-endnotes">
+<hr />
+<ol>
+<li id="fn1"><p>C-Eval results are evaluated on the validation
+datasets<a href="#fnref1" class="footnote-back"
+role="doc-backlink">↩︎</a></p></li>
+</ol>
+</section>
+</div>
+## Math & Reasoning
+|                 Model | gsm8k (0-shot) | MATH (0-shot) | BBH (0-shot) |
+|----------------------:|:--------------:|:-------------:|:------------:|
+|                 GPT-4 |      91.4      |     45.8      |     86.7     |
+|          SUS-Chat-34B |   **80.06**    |     28.7      |    67.62     |
+|         Qwen-72b-Chat |  <u>76.57</u>  |   **35.9**    |  **72.63**   |
+|     Deepseek-68b-Chat |     74.45      | <u>29.56</u>  | <u>69.73</u> |
+| OrionStar-Yi-34B-Chat |     54.36      |     12.8      |    62.88     |
+|           Yi-34B-Chat |     63.76      |     10.02     |    61.54     |
+## More Tasks
+|                 Model | winogrande (5-shot) | arc (25-shot) | hellaswag (10-shot) | TruthfulQA mc1 (0-shot) | TruthfulQA mc2 (0-shot) |
+|----------------------:|:-------------------:|:-------------:|:-------------------:|:-----------------------:|:-----------------------:|
+|                 GPT-4 |          —          |     94.5      |        91.4         |          59.00          |            —            |
+|          SUS-Chat-34B |      **81.22**      | <u>81.54</u>  |        83.79        |        **40.64**        |        **57.47**        |
+|         Qwen-72b-Chat |        76.09        |   **82.10**   |    <u>86.06</u>     |          39.17          |      <u>56.37</u>       |
+|     Deepseek-68b-Chat |    <u>80.58</u>     |     81.29     |      **87.02**      |      <u>40.02</u>       |          50.64          |
+| OrionStar-Yi-34B-Chat |        77.27        |     80.19     |        84.54        |          36.47          |          53.24          |
+|           Yi-34B-Chat |        76.64        |     70.66     |        82.29        |          38.19          |          54.57          |
+## Overall
+|                 Model |  Average  |
+|----------------------:|:---------:|
+|          SUS-Chat-34B | **69.05** |
+|         Qwen-72b-Chat |   68.41   |
+|     Deepseek-68b-Chat |   62.91   |
+| OrionStar-Yi-34B-Chat |   60.21   |
+|           Yi-34B-Chat |   59.72   |
+To reproduce the results, please start a corresponding vllm server and
+refer to
+[here](https://sustech-tlem.static.hf.space/index.html#start-evaluating-your-model-in-3-line).
+# Usage
+SUS-Chat-34B is a standard LLaMA model and should be seamlessly
+compatible with the LLaMA ecosystem. We provide the following example to
+demonstrate how it can be used for multi-turn dialogues.
+Feel free to [open an
+issue](https://github.com/SUSTech-IDEA/SUS-Chat/issues) if you have any
+questions.
+``` python
+from transformers import AutoModelForCausalLM, AutoTokenizer # 🤗 Transformers, or
+# from modelscope import AutoModelForCausalLM, AutoTokenizer # 🤖 ModelScope
 def chat_template(messages):
     history = ""
     for message in messages:
         match message:
+            case {"role": "user", "content": message}:
                 history += f"### Human: {message}\n\n### Assistant: "
             case {"role": "assistant", "content": message}:
                 history += message
 messages = [{"role": "user", "content": "hi"}]
+input_ids = tokenizer.encode(
+    chat_template(messages), return_tensors="pt", add_special_tokens=False
+).to("cuda")
+output_ids = model.generate(input_ids.to("cuda"), max_length=256)
 response = tokenizer.decode(
+    output_ids[0][input_ids.shape[1] :], skip_special_tokens=False
 )
 messages.append({"role": "assistant", "content": response})
 messages.append({"role": "user", "content": "What is the capital of China?"})
+input_ids = tokenizer.encode(
+    chat_template(messages), return_tensors="pt", add_special_tokens=False
+).to("cuda")
+output_ids = model.generate(input_ids.to("cuda"), max_length=256)
 response = tokenizer.decode(
+    output_ids[0][input_ids.shape[1] :], skip_special_tokens=False
 )
 messages.append({"role": "assistant", "content": response})
 ```
+# Limitations
+SUS-Chat has only undergone supervised fine-tuning and has not yet been
+trained on human preference learning. As a result, it may produce
+unreasonable responses in some situations and exacerbate existing issues
+in language models, including hallucinations, non-determinism, and
+cumulative errors. To achieve better performance for downstream tasks,
+we recommend adjusting the generation configuration parameters
+accordingly.
+# Disclaimer
+During the training process, we used data compliance check algorithms to
+ensure the compliance of the training model as much as possible. Due to
+the complexity of the data and the diverse use cases of language models,
+we cannot guarantee that the model will produce correct and reasonable
+outputs in all scenarios. Please be aware that there is still a risk of
+the model generating problematic outputs. We will not be responsible for
+any risks or issues arising from misuse, misguidance, illegal use, and
+related misinformation, as well as data security issues related to the
+model.
+# License
+This model is developed entirely for academic research and free
+commercial use, but it must adhere to the
+[license](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt)
+from [01-ai](https://huggingface.co/01-ai).