updata README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,73 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="left">
|
2 |
+
<h1>
|
3 |
+
Mengzi3-13B-Base
|
4 |
+
</h1>
|
5 |
+
</div>
|
6 |
+
|
7 |
+
<p align="center">
|
8 |
+
<img src="https://raw.githubusercontent.com/Langboat/Mengzi3/main/assets/mengzi_logo.png" width="200"/>
|
9 |
+
<p>
|
10 |
+
|
11 |
+
<p align="center">
|
12 |
+
🤗 <a href="https://huggingface.co/Langboat">Hugging Face</a> | 🤖 <a href="https://modelscope.cn/organization/Langboat">ModelScope</a> | <a href="https://wisemodel.cn/organization/Langboat">Wisemodel</a> | 💬 <a href="https://github.com/Langboat/Mengzi3/blob/main/assets/wechat.png">WeChat</a> | <a href="https://www.langboat.com/document/mengzi/mengzi-gpt/call">API</a> | <a href="https://www.langboat.com/portal/mengzi-gpt">孟子GPT</a>
|
13 |
+
</p>
|
14 |
+
|
15 |
+
# 模型介绍/Introduction
|
16 |
+
|
17 |
+
本次开源Mengzi3 13B系列模型,模型的地址如下:
|
18 |
+
|
19 |
+
| | Mengzi3-13B-Base | Mengzi3-13B-Chat |
|
20 |
+
| :-: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: |
|
21 |
+
| 13B | [🤗](https://huggingface.co/Langboat/Mengzi3-13B-Base) / [🤖](https://modelscope.cn/Langboat/Mengzi3-13B-Base) / [Wisemodel](https://wisemodel.cn/models/Langboat/Mengzi3-13B-Base) | 敬请期待 |
|
22 |
+
|
23 |
+
Mengzi3-13B模型基于Llama架构,语料精选自网页、百科、社交、媒体、新闻,以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练,模型的中文能力突出并且兼顾多语言能力。
|
24 |
+
|
25 |
+
Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities.
|
26 |
+
|
27 |
+
# 快速开始/Quickstart
|
28 |
+
|
29 |
+
```python
|
30 |
+
import torch
|
31 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
32 |
+
|
33 |
+
tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True)
|
34 |
+
model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True)
|
35 |
+
inputs = tokenizer('介绍一下孟子:', return_tensors='pt')
|
36 |
+
if torch.cuda.is_available():
|
37 |
+
inputs = inputs.to('cuda')
|
38 |
+
pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.1, eos_token_id=tokenizer.eos_token_id)
|
39 |
+
print(tokenizer.decode(pred[0], skip_special_tokens=True))
|
40 |
+
```
|
41 |
+
|
42 |
+
详细的模型推理和微调代码见[Github](https://github.com/Langboat/Mengzi3)
|
43 |
+
|
44 |
+
Detailed code of model reasoning and finetune see [Github](https://github.com/Langboat)
|
45 |
+
|
46 |
+
# 性能评测/Evaluation
|
47 |
+
|
48 |
+
Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比,语言能力成绩领先,数学和编程能力位于前列。
|
49 |
+
|
50 |
+
Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests.
|
51 |
+
|
52 |
+
| | MMLU | CMMLU | OCNLI | GSM8K | HumanEval |
|
53 |
+
| :------------------------: | :---------------------: | :---------------------: | :---------------------: | :---: | :-------: |
|
54 |
+
| Baichuan2-13B-Base | 0.530 | 0.489 | 0.433 | 0.528 | 0.171 |
|
55 |
+
| Qwen-14B | 0.589 | 0.539 | 0.550 | 0.613 | 0.323 |
|
56 |
+
| ChatGLM3-6B-base | 0.551 | 0.495 | 0.754 | 0.723 | - |
|
57 |
+
| InternLM2-20B | 0.610 | 0.538 | 0.650 | 0.761 | 0.488 |
|
58 |
+
| Skywork-13B-base | 0.557 | 0.524 | 0.426 | 0.558 | - |
|
59 |
+
| LingoWhale-8B | 0.541 | 0.495 | 0.352 | 0.550 | 0.329 |
|
60 |
+
| DeepSeek-7B | 0.436 | 0.424 | 0.356 | 0.174 | 0.262 |
|
61 |
+
| DeepSeek-MoE-16B-base | 0.423 | 0.388 | 0.342 | 0.188 | 0.268 |
|
62 |
+
| MindSource-7B | 0.498 | 0.425 | 0.528 | - | - |
|
63 |
+
| **Mengzi3-13B-Base** | **0.651 (+6.7%)** | **0.588 (+9.1%)** | **0.776 (+2.9%)** | 0.631 | 0.287 |
|
64 |
+
|
65 |
+
> 以上结果基于5-shot,MMLU/CMMLU/OCNLI结果来自[FlagEval](https://flageval.baai.ac.cn/)
|
66 |
+
>
|
67 |
+
> The above results are based on 5-shot,MMLU/CMMLU/OCNLI results from [FlagEval](https://flageval.baai.ac.cn/)
|
68 |
+
|
69 |
+
# 协议/License Agreement
|
70 |
+
|
71 |
+
Mengzi3-13B-Base依照Apache 2.0协议开源,对学术研究完全开放,同时支持免费商用。如需申请商业许可证,请[联系我们](https://www.langboat.com/form?p=3),其他商务合作请联系[bd@langboat.com](mailto:bd@langboat.com)。
|
72 |
+
|
73 |
+
Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please [contact us](https://www.langboat.com/en/form?p=3), other business cooperation, please contact [bd@langboat.com](mailto:bd@langboat.com).
|