Langboat
/

Mengzi3-13B-Base

+<div align="left">
+<h1>
+Mengzi3-13B-Base
+</h1>
+</div>
+<p align="center">
+    <img src="https://raw.githubusercontent.com/Langboat/Mengzi3/main/assets/mengzi_logo.png" width="200"/>
+<p>
+<p align="center">
+        🤗 <a href="https://huggingface.co/Langboat">Hugging Face</a> | 🤖 <a href="https://modelscope.cn/organization/Langboat">ModelScope</a> |  <a href="https://wisemodel.cn/organization/Langboat">Wisemodel</a> ｜ 💬 <a href="https://github.com/Langboat/Mengzi3/blob/main/assets/wechat.png">WeChat</a> | <a href="https://www.langboat.com/document/mengzi/mengzi-gpt/call">API</a> | <a href="https://www.langboat.com/portal/mengzi-gpt">孟子GPT</a>
+</p>
+# 模型介绍/Introduction
+本次开源Mengzi3 13B系列模型，模型的地址如下:
+|    |                                                                               Mengzi3-13B-Base                                                                               | Mengzi3-13B-Chat |
+| :-: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: |
+| 13B | [🤗](https://huggingface.co/Langboat/Mengzi3-13B-Base) / [🤖](https://modelscope.cn/Langboat/Mengzi3-13B-Base) / [Wisemodel](https://wisemodel.cn/models/Langboat/Mengzi3-13B-Base) |     敬请期待     |
+Mengzi3-13B模型基于Llama架构，语料精选自网页、百科、社交、媒体、新闻，以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练，模型的中文能力突出并且兼顾多语言能力。
+Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities.
+# 快速开始/Quickstart
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True)
+inputs = tokenizer('介绍一下孟子：', return_tensors='pt')
+if torch.cuda.is_available():
+    inputs = inputs.to('cuda')
+pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.1, eos_token_id=tokenizer.eos_token_id)
+print(tokenizer.decode(pred[0], skip_special_tokens=True))
+```
+详细的模型推理和微调代码见[Github](https://github.com/Langboat/Mengzi3)
+Detailed code of model reasoning and finetune see [Github](https://github.com/Langboat)
+# 性能评测/Evaluation
+Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比，语言能力成绩领先，数学和编程能力位于前列。
+Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests.
+|                            |          MMLU          |          CMMLU          |          OCNLI          | GSM8K | HumanEval |
+| :------------------------: | :---------------------: | :---------------------: | :---------------------: | :---: | :-------: |
+|     Baichuan2-13B-Base     |          0.530          |          0.489          |          0.433          | 0.528 |   0.171   |
+|          Qwen-14B          |          0.589          |          0.539          |          0.550          | 0.613 |   0.323   |
+|      ChatGLM3-6B-base      |          0.551          |          0.495          |          0.754          | 0.723 |     -     |
+|       InternLM2-20B       |          0.610          |          0.538          |          0.650          | 0.761 |   0.488   |
+|      Skywork-13B-base      |          0.557          |          0.524          |          0.426          | 0.558 |     -     |
+|       LingoWhale-8B       |          0.541          |          0.495          |          0.352          | 0.550 |   0.329   |
+|        DeepSeek-7B        |          0.436          |          0.424          |          0.356          | 0.174 |   0.262   |
+|   DeepSeek-MoE-16B-base   |          0.423          |          0.388          |          0.342          | 0.188 |   0.268   |
+|       MindSource-7B       |          0.498          |          0.425          |          0.528          |   -   |     -     |
+| **Mengzi3-13B-Base** | **0.651 (+6.7%)** | **0.588 (+9.1%)** | **0.776 (+2.9%)** | 0.631 |   0.287   |
+> 以上结果基于5-shot，MMLU/CMMLU/OCNLI结果来自[FlagEval](https://flageval.baai.ac.cn/)
+>
+> The above results are based on 5-shot，MMLU/CMMLU/OCNLI results from [FlagEval](https://flageval.baai.ac.cn/)
+# 协议/License Agreement
+Mengzi3-13B-Base依照Apache 2.0协议开源，对学术研究完全开放，同时支持免费商用。如需申请商业许可证，请[联系我们](https://www.langboat.com/form?p=3)，其他商务合作请联系[bd@langboat.com](mailto:bd@langboat.com)。
+Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please [contact us](https://www.langboat.com/en/form?p=3), other business cooperation, please contact [bd@langboat.com](mailto:bd@langboat.com).