Mengzi3-13B-Base

---
license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
---
<div align="left">
<h1>
Mengzi3-13B-Base
</h1>
</div>

<p align="center">
    <img src="https://raw.githubusercontent.com/Langboat/Mengzi3/main/assets/mengzi_logo.png" width="200"/>
<p>

<p align="center">
        🤗 <a href="https://huggingface.co/Langboat">Hugging Face</a> | 🤖 <a href="https://modelscope.cn/organization/Langboat">ModelScope</a> |  <a href="https://wisemodel.cn/organization/Langboat">Wisemodel</a> ｜ 💬 <a href="https://github.com/Langboat/Mengzi3/blob/main/assets/wechat.png">WeChat</a> | <a href="https://www.langboat.com/document/mengzi/mengzi-gpt/call">API</a> | <a href="https://www.langboat.com/portal/mengzi-gpt">孟子GPT</a>
</p>

# 模型介绍/Introduction

本次开源Mengzi3 13B系列模型，模型的地址如下:

|    |                                                                               Mengzi3-13B-Base                                                                               | Mengzi3-13B-Chat |
| :-: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: |
| 13B | [🤗](https://huggingface.co/Langboat/Mengzi3-13B-Base) / [🤖](https://modelscope.cn/Langboat/Mengzi3-13B-Base) / [Wisemodel](https://wisemodel.cn/models/Langboat/Mengzi3-13B-Base) |     敬请期待     |

Mengzi3-13B模型基于Llama架构，语料精选自网页、百科、社交、媒体、新闻，以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练，模型的中文能力突出并且兼顾多语言能力。

Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities.

# 快速开始/Quickstart

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True)
inputs = tokenizer('指令：回答以下问题。输入：介绍一下孟子。输出：', return_tensors='pt')
if torch.cuda.is_available():
    inputs = inputs.to('cuda')
pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.01, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(pred[0], skip_special_tokens=True))
```

详细的模型推理和微调代码见[Github](https://github.com/Langboat/Mengzi3)

Detailed code of model reasoning and finetune see [Github](https://github.com/Langboat)

# 性能评测/Evaluation

Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比，语言能力成绩领先，数学和编程能力位于前列。

Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests.

|                            |          MMLU          |          CMMLU          |          OCNLI          | GSM8K | HumanEval |
| :------------------------: | :---------------------: | :---------------------: | :---------------------: | :---: | :-------: |
|     Baichuan2-13B-Base     |          0.530          |          0.489          |          0.433          | 0.528 |   0.171   |
|          Qwen-14B          |          0.589          |          0.539          |          0.550          | 0.613 |   0.323   |
|      ChatGLM3-6B-base      |          0.551          |          0.495          |          0.754          | 0.723 |     -     |
|       InternLM2-20B       |          0.610          |          0.538          |          0.650          | 0.761 |   0.488   |
|      Skywork-13B-base      |          0.557          |          0.524          |          0.426          | 0.558 |     -     |
|       LingoWhale-8B       |          0.541          |          0.495          |          0.352          | 0.550 |   0.329   |
|        DeepSeek-7B        |          0.436          |          0.424          |          0.356          | 0.174 |   0.262   |
|   DeepSeek-MoE-16B-base   |          0.423          |          0.388          |          0.342          | 0.188 |   0.268   |
|       MindSource-7B       |          0.498          |          0.425          |          0.528          |   -   |     -     |
| **Mengzi3-13B-Base** | **0.651 (+6.7%)** | **0.588 (+9.1%)** | **0.776 (+2.9%)** | 0.631 |   0.287   |

> 以上结果基于5-shot，MMLU/CMMLU/OCNLI结果来自[FlagEval](https://flageval.baai.ac.cn/)
>
> The above results are based on 5-shot，MMLU/CMMLU/OCNLI results from [FlagEval](https://flageval.baai.ac.cn/)

# 协议/License Agreement

Mengzi3-13B-Base依照Apache 2.0协议开源，对学术研究完全开放，同时支持免费商用。如需申请商业许可证，请[联系我们](https://www.langboat.com/form?p=3)，其他商务合作请联系[bd@langboat.com](mailto:bd@langboat.com)。

Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please [contact us](https://www.langboat.com/en/form?p=3), other business cooperation, please contact [bd@langboat.com](mailto:bd@langboat.com).