--- license: apache-2.0 language: - zh - en pipeline_tag: text-generation ---

Mengzi3-13B-Base

🤗 Hugging Face | 🤖 ModelScope | Wisemodel | 💬 WeChat | API | 孟子GPT

# 模型介绍/Introduction 本次开源Mengzi3 13B系列模型,模型的地址如下: | | Mengzi3-13B-Base | Mengzi3-13B-Chat | | :-: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: | | 13B | [🤗](https://huggingface.co/Langboat/Mengzi3-13B-Base) / [🤖](https://modelscope.cn/Langboat/Mengzi3-13B-Base) / [Wisemodel](https://wisemodel.cn/models/Langboat/Mengzi3-13B-Base) | 敬请期待 | Mengzi3-13B模型基于Llama架构,语料精选自网页、百科、社交、媒体、新闻,以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练,模型的中文能力突出并且兼顾多语言能力。 Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities. # 快速开始/Quickstart ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True) inputs = tokenizer('指令:回答以下问题。输入:介绍一下孟子。输出:', return_tensors='pt') if torch.cuda.is_available(): inputs = inputs.to('cuda') pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.01, eos_token_id=tokenizer.eos_token_id) print(tokenizer.decode(pred[0], skip_special_tokens=True)) ``` 详细的模型推理和微调代码见[Github](https://github.com/Langboat/Mengzi3) Detailed code of model reasoning and finetune see [Github](https://github.com/Langboat) # 性能评测/Evaluation Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比,语言能力成绩领先,数学和编程能力位于前列。 Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests. | | MMLU | CMMLU | OCNLI | GSM8K | HumanEval | | :------------------------: | :---------------------: | :---------------------: | :---------------------: | :---: | :-------: | | Baichuan2-13B-Base | 0.530 | 0.489 | 0.433 | 0.528 | 0.171 | | Qwen-14B | 0.589 | 0.539 | 0.550 | 0.613 | 0.323 | | ChatGLM3-6B-base | 0.551 | 0.495 | 0.754 | 0.723 | - | | InternLM2-20B | 0.610 | 0.538 | 0.650 | 0.761 | 0.488 | | Skywork-13B-base | 0.557 | 0.524 | 0.426 | 0.558 | - | | LingoWhale-8B | 0.541 | 0.495 | 0.352 | 0.550 | 0.329 | | DeepSeek-7B | 0.436 | 0.424 | 0.356 | 0.174 | 0.262 | | DeepSeek-MoE-16B-base | 0.423 | 0.388 | 0.342 | 0.188 | 0.268 | | MindSource-7B | 0.498 | 0.425 | 0.528 | - | - | | **Mengzi3-13B-Base** | **0.651 (+6.7%)** | **0.588 (+9.1%)** | **0.776 (+2.9%)** | 0.631 | 0.287 | > 以上结果基于5-shot,MMLU/CMMLU/OCNLI结果来自[FlagEval](https://flageval.baai.ac.cn/) > > The above results are based on 5-shot,MMLU/CMMLU/OCNLI results from [FlagEval](https://flageval.baai.ac.cn/) # 协议/License Agreement Mengzi3-13B-Base依照Apache 2.0协议开源,对学术研究完全开放,同时支持免费商用。如需申请商业许可证,请[联系我们](https://www.langboat.com/form?p=3),其他商务合作请联系[bd@langboat.com](mailto:bd@langboat.com)。 Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please [contact us](https://www.langboat.com/en/form?p=3), other business cooperation, please contact [bd@langboat.com](mailto:bd@langboat.com).