chunhui commited on
Commit
19a5551
1 Parent(s): 5ad47c5

updata README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -3
README.md CHANGED
@@ -1,3 +1,73 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="left">
2
+ <h1>
3
+ Mengzi3-13B-Base
4
+ </h1>
5
+ </div>
6
+
7
+ <p align="center">
8
+ <img src="https://raw.githubusercontent.com/Langboat/Mengzi3/main/assets/mengzi_logo.png" width="200"/>
9
+ <p>
10
+
11
+ <p align="center">
12
+ 🤗 <a href="https://huggingface.co/Langboat">Hugging Face</a> | 🤖 <a href="https://modelscope.cn/organization/Langboat">ModelScope</a> | <a href="https://wisemodel.cn/organization/Langboat">Wisemodel</a> | 💬 <a href="https://github.com/Langboat/Mengzi3/blob/main/assets/wechat.png">WeChat</a> | <a href="https://www.langboat.com/document/mengzi/mengzi-gpt/call">API</a> | <a href="https://www.langboat.com/portal/mengzi-gpt">孟子GPT</a>
13
+ </p>
14
+
15
+ # 模型介绍/Introduction
16
+
17
+ 本次开源Mengzi3 13B系列模型,模型的地址如下:
18
+
19
+ | | Mengzi3-13B-Base | Mengzi3-13B-Chat |
20
+ | :-: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------: |
21
+ | 13B | [🤗](https://huggingface.co/Langboat/Mengzi3-13B-Base) / [🤖](https://modelscope.cn/Langboat/Mengzi3-13B-Base) / [Wisemodel](https://wisemodel.cn/models/Langboat/Mengzi3-13B-Base) | 敬请期待 |
22
+
23
+ Mengzi3-13B模型基于Llama架构,语料精选自网页、百科、社交、媒体、新闻,以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练,模型的中文能力突出并且兼顾多语言能力。
24
+
25
+ Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities.
26
+
27
+ # 快速开始/Quickstart
28
+
29
+ ```python
30
+ import torch
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True)
34
+ model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True)
35
+ inputs = tokenizer('介绍一下孟子:', return_tensors='pt')
36
+ if torch.cuda.is_available():
37
+ inputs = inputs.to('cuda')
38
+ pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.1, eos_token_id=tokenizer.eos_token_id)
39
+ print(tokenizer.decode(pred[0], skip_special_tokens=True))
40
+ ```
41
+
42
+ 详细的模型推理和微调代码见[Github](https://github.com/Langboat/Mengzi3)
43
+
44
+ Detailed code of model reasoning and finetune see [Github](https://github.com/Langboat)
45
+
46
+ # 性能评测/Evaluation
47
+
48
+ Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比,语言能力成绩领先,数学和编程能力位于前列。
49
+
50
+ Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests.
51
+
52
+ | | MMLU | CMMLU | OCNLI | GSM8K | HumanEval |
53
+ | :------------------------: | :---------------------: | :---------------------: | :---------------------: | :---: | :-------: |
54
+ | Baichuan2-13B-Base | 0.530 | 0.489 | 0.433 | 0.528 | 0.171 |
55
+ | Qwen-14B | 0.589 | 0.539 | 0.550 | 0.613 | 0.323 |
56
+ | ChatGLM3-6B-base | 0.551 | 0.495 | 0.754 | 0.723 | - |
57
+ | InternLM2-20B | 0.610 | 0.538 | 0.650 | 0.761 | 0.488 |
58
+ | Skywork-13B-base | 0.557 | 0.524 | 0.426 | 0.558 | - |
59
+ | LingoWhale-8B | 0.541 | 0.495 | 0.352 | 0.550 | 0.329 |
60
+ | DeepSeek-7B | 0.436 | 0.424 | 0.356 | 0.174 | 0.262 |
61
+ | DeepSeek-MoE-16B-base | 0.423 | 0.388 | 0.342 | 0.188 | 0.268 |
62
+ | MindSource-7B | 0.498 | 0.425 | 0.528 | - | - |
63
+ | **Mengzi3-13B-Base** | **0.651 (+6.7%)** | **0.588 (+9.1%)** | **0.776 (+2.9%)** | 0.631 | 0.287 |
64
+
65
+ > 以上结果基于5-shot,MMLU/CMMLU/OCNLI结果来自[FlagEval](https://flageval.baai.ac.cn/)
66
+ >
67
+ > The above results are based on 5-shot,MMLU/CMMLU/OCNLI results from [FlagEval](https://flageval.baai.ac.cn/)
68
+
69
+ # 协议/License Agreement
70
+
71
+ Mengzi3-13B-Base依照Apache 2.0协议开源,对学术研究完全开放,同时支持免费商用。如需申请商业许可证,请[联系我们](https://www.langboat.com/form?p=3),其他商务合作请联系[bd@langboat.com](mailto:bd@langboat.com)。
72
+
73
+ Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please [contact us](https://www.langboat.com/en/form?p=3), other business cooperation, please contact [bd@langboat.com](mailto:bd@langboat.com).