File size: 6,355 Bytes
c0e98af febaf5a c0e98af febaf5a c0e98af febaf5a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
---
<div align="left">
<h1>
Mengzi3-13B-Base
</h1>
</div>
<p align="center">
<img src="https://raw.githubusercontent.com/Langboat/Mengzi3/main/assets/mengzi_logo.png" width="200"/>
<p>
<p align="center">
🤗 <a href="https://huggingface.co/Langboat">Hugging Face</a> | 🤖 <a href="https://modelscope.cn/organization/Langboat">ModelScope</a> | <a href="https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md"><img src="https://www.mindspore.cn/_static/logo-zh-light.99fc9222.svg" width="50" style="white-space: nowrap;display: inline-block;overflow: hidden;max-width: 100%;"/></a> | <a href="https://wisemodel.cn/organization/Langboat">Wisemodel</a> | 💬 <a href="https://github.com/Langboat/Mengzi3/blob/main/assets/wechat.png">WeChat</a> | <a href="https://www.langboat.com/document/mengzi/mengzi-gpt/call">API</a> | <a href="https://www.langboat.com/portal/mengzi-gpt"><img src="https://raw.githubusercontent.com/Langboat/Mengzi3/main/assets/mengzi_logo.png" width="16" style="white-space: nowrap;display: inline-block;overflow: hidden;max-width: 100%;"/> 孟子GPT</a>
</p>
# 模型介绍/Introduction
本次开源Mengzi3 13B系列模型,模型的地址如下:
| | Mengzi3-13B-Base | Mengzi3-13B-Chat |
| :-: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: |
| 13B | [🤗](https://huggingface.co/Langboat/Mengzi3-13B-Base) / [🤖](https://modelscope.cn/Langboat/Mengzi3-13B-Base) / [MindSpore](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) / [Wisemodel](https://wisemodel.cn/models/Langboat/Mengzi3-13B-Base) | 敬请期待 |
Mengzi3-13B模型基于Llama架构,语料精选自网页、百科、社交、媒体、新闻,以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练,模型的中文能力突出并且兼顾多语言能力。
Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities.
# 快速开始/Quickstart
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True)
inputs = tokenizer('指令:回答以下问题。输入:介绍一下孟子。输出:', return_tensors='pt')
if torch.cuda.is_available():
inputs = inputs.to('cuda')
pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.01, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(pred[0], skip_special_tokens=True))
```
详细的模型推理和微调代码见[Github](https://github.com/Langboat/Mengzi3)
Detailed code of model reasoning and finetune see [Github](https://github.com/Langboat)
# 性能评测/Evaluation
Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比,语言能力成绩领先,数学和编程能力位于前列。
Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests.
| | MMLU | CMMLU | OCNLI | GSM8K | HumanEval |
| :------------------------: | :---------------------: | :---------------------: | :---------------------: | :---: | :-------: |
| Baichuan2-13B-Base | 0.530 | 0.489 | 0.433 | 0.528 | 0.171 |
| Qwen-14B | 0.589 | 0.539 | 0.550 | 0.613 | 0.323 |
| ChatGLM3-6B-base | 0.551 | 0.495 | 0.754 | 0.723 | - |
| InternLM2-20B | 0.610 | 0.538 | 0.650 | 0.761 | 0.488 |
| Skywork-13B-base | 0.557 | 0.524 | 0.426 | 0.558 | - |
| LingoWhale-8B | 0.541 | 0.495 | 0.352 | 0.550 | 0.329 |
| DeepSeek-7B | 0.436 | 0.424 | 0.356 | 0.174 | 0.262 |
| DeepSeek-MoE-16B-base | 0.423 | 0.388 | 0.342 | 0.188 | 0.268 |
| MindSource-7B | 0.498 | 0.425 | 0.528 | - | - |
| **Mengzi3-13B-Base** | **0.651 (+6.7%)** | **0.588 (+9.1%)** | **0.776 (+2.9%)** | 0.631 | 0.287 |
> 以上结果基于5-shot,MMLU/CMMLU/OCNLI结果来自[FlagEval](https://flageval.baai.ac.cn/)
>
> The above results are based on 5-shot,MMLU/CMMLU/OCNLI results from [FlagEval](https://flageval.baai.ac.cn/)
# 协议/License Agreement
Mengzi3-13B-Base依照Apache 2.0协议开源,对学术研究完全开放,同时支持免费商用。如需申请商业许可证,请[联系我们](https://www.langboat.com/form?p=3),其他商务合作请联系[bd@langboat.com](mailto:bd@langboat.com)。
Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please [contact us](https://www.langboat.com/en/form?p=3), other business cooperation, please contact [bd@langboat.com](mailto:bd@langboat.com).
|