Yuan2-M32-gguf / README.md

IEIT-Yuan

model upload

1db3755 4 months ago

preview code

raw

history blame

No virus

8.37 kB

	---
	license: other
	license_name: license-yuan
	license_link: https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan
	---
	<div align="center">
	<h1>
	源2.0 M32大模型
	</h1>
	</div>


	<div align="center">


	<a href="code_license">
	<img alt="Code License" src="https://img.shields.io/badge/Apache%202.0%20-green?style=flat&label=Code%20License&link=https%3A%2F%2Fgithub.com%2FIEIT-Yuan%2FYuan-2.0-MoE%3Ftab%3DApache-2.0-1-ov-file"/>
	</a>
	<a href="model_license">
	<img alt="Model License" src="https://img.shields.io/badge/Yuan2.0%20License-blue?style=flat&logoColor=blue&label=Model%20License&color=blue&link=https%3A%2F%2Fgithub.com%2FIEIT-Yuan%2FYuan-2.0%2Fblob%2Fmain%2FLICENSE-Yuan" />
	</a>

	</div>




	<p align="center">
	👾 <a href="https://www.modelscope.cn/profile/YuanLLM" target="_blank">ModelScope</a> • 🤗 <a href="https://huggingface.co/IEITYuan" target="_blank">Hugging Face</a> • 💬 <a href="https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/images/%E6%BA%90%E5%85%AC%E4%BC%97%E5%8F%B7%E4%BA%8C%E7%BB%B4%E7%A0%81.png" target="_blank">WeChat</a>• 📎 <a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Paper.pdf" target="_blank">源2.0 M32论文</a>
	</p>




	## 1. Introduction


	浪潮信息 “源2.0 M32”大模型（简称，Yuan2.0-M32）采用稀疏混合专家架构（MoE），以Yuan2.0-2B模型作为基底模型，通过创新的门控网络（Attention Router）实现32个专家间（Expers32）的协同工作与任务调度，在显著降低模型推理算力需求的情况下，带来了更强的模型精度表现与推理性能；源2.0-M32在多个业界主流的评测进行了代码生成、数学问题求解、科学问答与综合知识能力等方面的能力测评。结果显示，源2.0-M32在多项任务评测中，展示出了较为先进的能力表现，并在MATH（数学求解）、MMLU（综合知识能力）ARC-C（科学问答）榜单上全面超越LLaMA3-700亿模型。。Yuan2.0-M32大模型* 基本信息如下：

	+ 模型参数量： 40B <br>
	+ 专家数量： 32 <br>
	+ 激活专家数： 2 <br>
	+ 激活参数量： 3.7B <br>
	+ 训练数据量： 2000B tokens <br>
	+ 支持序列长度： 16K <br>


	同时，我们发布了Yuan2.0-M32模型的<a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Paper.pdf" target="_blank">技术报告</a>，可以通过论文查看更详细的技术细节与测评结果。



	## 2. Model Downloads

	我们提供多种模型格式的下载链接：

	\| 模型 \| 序列长度 \| 模型格式 \| 下载链接 \|
	\| :----------: \| :------: \| :-------: \|:---------------------------: \|
	\| Yuan2.0-M32 \| 16K \| Megatron \| [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32)
	\| Yuan2.0-M32-HF \| 16K \| HuggingFace \| [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-hf)
	\| Yuan2.0-M32-GGUF \| 16K \| GGUF \| [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf)
	\| Yuan2.0-M32-GGUF-INT4 \| 16K \| GGUF \| [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf-int4/)


	## 3. Evaluation Results


	3.1 Benchmarks 测试 🏆


	Yuan2.0-M32 模型与多个闭源、开源模型相比，均呈现出较好的精度表现。我们评测的数据集包括：Humaneval、GSM8K、MMLU、Math、ARC-Challenge，用于考察模型在自然语言理解、知识、数学计算和推理、代码生成等任务上的能力。Yuan2.0-M32模型在所有测评任务上全面超越了Llama3-8B、Mistral-8*7B等模型，综合能力表现可以对标 Llama3-70B模型。



	\| Model \| HumanEval \| GSM8K \| MMLU \| Math \| ARC-C\* \|
	\| ------------------ \| :---------------: \| :------------: \| :---------------: \| :---------------: \| :---------------:\|
	\| Llama3-70B \| 81.7% \| 93% \| 80.3 \| 50.4% \| 93.3% \|
	\| Llama3-8B \| 62.2% \| 79.6% \| 68.4% \| 30% \| 78.6% \|
	\| Phi-3-medium \| 62.2% \| 91.0% \| 78.0% \| - \| 91.6% \|
	\| Phi-3-small \| 61% \| 89.6% \| 75.7% \| - \| 90.7% \|
	\| Phi-3-mini \| 58.5% \| 82.5% \| 68.8% \| - \| 84.9% \|
	\| Mistral-8*22B \| 45.1% \| 78.6% \| 77.8% \| 41,8% \| 91.3% \|
	\| Mistral-8*7B \| 40.2% \| 58.4% \| 70.86% \| 28.4% \| 85.9% \|
	\| Yuan2.0-M32 \| 74.4% \| 92.5% \| 72.2% \| 55.9% \| 95.8% \|


	\* __ARC-C__：ARC-Challenge， ARC数据集中的高阶测试问题，需要深层的推理能力和更广泛的知识背景。

	-----

	3.2 模型算力效率

	\| Model \| Params (B) \| Active Params (B) \| GFLOPs/token (Inference) \| GFLOPs/token (Fine-tune) \| Mean Accuracy \| Mean Accuracy GFLOPs per token (Inference) \|
	\| ------------------ \| :---------------: \| :------------: \| :---------------: \| :---------------: \| :---------------:\|:---------------:\|
	\| \| 参数量 \| 激活参数量 \| 算力消耗/token （推理阶段） \| 算力消耗/token （微调阶段） \| 平均测评分数 \| 模型算力效率 \|
	\| Llama3-70B \| 70 \| 70 \| 140 \| 420 \| 79.25 \| 0.57 \|
	\| Llama3-8B \| 8 \| 8 \| 16 \| 48 \| 64.15 \| 4.00 \|
	\| Mistral-8*22B \| 141 \| 39 \| 78 \| 234 \| 72.38 \| 0.93 \|
	\| Mistral-8*7B \| 47 \| 129 \| 25.8 \| 77,3 \| 60.83 \| 2.36 \|
	\| Yuan2.0-M32 \| 40 \| 3.7 \| 7.4 \| 22.2 \| 79.15 \| 10.69 \|






	## 4. Quick Start


	4.1 环境配置

	我们建议使用yuan2.0-M32的最新docker[镜像文件](https://hub.docker.com/r/yuanmodel/yuan2.0:m32).

	我们可以通过下面命令启动容器：

	```bash
	docker pull yuanmodel/yuan2.0:V1-base
	docker run --gpus all --privileged --ulimit stack=68719476736 --shm-size=1000G -itd -v /path/to/yuan_2.0:/workspace/yuan_2.0 -v /path/to/dataset:/workspace/dataset -v /path/to/checkpoints:/workspace/checkpoints --name your_name yuanmodel/yuan2.0:V1-base
	docker exec -it your_name bash
	```


	4.2 数据预处理

	我们提供了数据预处理的脚本，参考[数据预处理说明文档](./docs/data_process.md).

	4.3 模型预训练

	我们提供了用于预训练的文档和 [`example`](./examples)的脚本，具体使用方法可以参考[预训练说明文档](./docs/pretrain.md).

	4.4 推理服务

	-详细部署方案可以参考[vllm](https://github.com/IEIT-Yuan/Yuan2.0-M32/edit/main/vllm/README_Yuan_vllm.md)


	## 5. Statement of Agreement

	使用源2.0代码及模型需遵循 [Apache 2.0](https://github.com/xxxxxxE) 开源协议和[《源2.0模型许可协议》](./LICENSE-Yuan)，源2.0模型支持商用，不需要申请授权，请您了解并遵循，勿将开源模型和代码及基于开源项目产生的衍生物用于任何可能给国家和社会带来危害的用途以及用于任何未经过安全评估和备案的服务。

	尽管模型在训练时我们已采取措施尽力确保数据的合规性和准确性，但模型参数量巨大且受概率随机性因素影响，我们无法保证输出内容的准确性，且模型易被输入指令所误导，本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。您将对通过使用、复制、分发和修改模型等方式利用该开源项目所产生的风险与后果，独自承担全部责任。

	---
	license: other
	license_name: license-yuan
	license_link: https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan
	---
	<div align="center">
	<h1>
	源2.0 M32大模型
	</h1>
	</div>


	<div align="center">


	<a href="code_license">
	<img alt="Code License" src="https://img.shields.io/badge/Apache%202.0%20-green?style=flat&label=Code%20License&link=https%3A%2F%2Fgithub.com%2FIEIT-Yuan%2FYuan-2.0-MoE%3Ftab%3DApache-2.0-1-ov-file"/>
	</a>
	<a href="model_license">
	<img alt="Model License" src="https://img.shields.io/badge/Yuan2.0%20License-blue?style=flat&logoColor=blue&label=Model%20License&color=blue&link=https%3A%2F%2Fgithub.com%2FIEIT-Yuan%2FYuan-2.0%2Fblob%2Fmain%2FLICENSE-Yuan" />
	</a>

	</div>




	<p align="center">
	👾 <a href="https://www.modelscope.cn/profile/YuanLLM" target="_blank">ModelScope</a> • 🤗 <a href="https://huggingface.co/IEITYuan" target="_blank">Hugging Face</a> • 💬 <a href="https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/images/%E6%BA%90%E5%85%AC%E4%BC%97%E5%8F%B7%E4%BA%8C%E7%BB%B4%E7%A0%81.png" target="_blank">WeChat</a>• 📎 <a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Paper.pdf" target="_blank">源2.0 M32论文</a>
	</p>




	## 1. Introduction


	浪潮信息 “源2.0 M32”大模型（简称，Yuan2.0-M32）采用稀疏混合专家架构（MoE），以Yuan2.0-2B模型作为基底模型，通过创新的门控网络（Attention Router）实现32个专家间（Expers32）的协同工作与任务调度，在显著降低模型推理算力需求的情况下，带来了更强的模型精度表现与推理性能；源2.0-M32在多个业界主流的评测进行了代码生成、数学问题求解、科学问答与综合知识能力等方面的能力测评。结果显示，源2.0-M32在多项任务评测中，展示出了较为先进的能力表现，并在MATH（数学求解）、MMLU（综合知识能力）ARC-C（科学问答）榜单上全面超越LLaMA3-700亿模型。。Yuan2.0-M32大模型* 基本信息如下：

	+ 模型参数量： 40B <br>
	+ 专家数量： 32 <br>
	+ 激活专家数： 2 <br>
	+ 激活参数量： 3.7B <br>
	+ 训练数据量： 2000B tokens <br>
	+ 支持序列长度： 16K <br>


	同时，我们发布了Yuan2.0-M32模型的<a href="https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Paper.pdf" target="_blank">技术报告</a>，可以通过论文查看更详细的技术细节与测评结果。



	## 2. Model Downloads

	我们提供多种模型格式的下载链接：

	\| 模型 \| 序列长度 \| 模型格式 \| 下载链接 \|
	\| :----------: \| :------: \| :-------: \|:---------------------------: \|
	\| Yuan2.0-M32 \| 16K \| Megatron \| [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32)
	\| Yuan2.0-M32-HF \| 16K \| HuggingFace \| [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-hf)
	\| Yuan2.0-M32-GGUF \| 16K \| GGUF \| [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf)
	\| Yuan2.0-M32-GGUF-INT4 \| 16K \| GGUF \| [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf-int4/)


	## 3. Evaluation Results


	3.1 Benchmarks 测试 🏆


	Yuan2.0-M32 模型与多个闭源、开源模型相比，均呈现出较好的精度表现。我们评测的数据集包括：Humaneval、GSM8K、MMLU、Math、ARC-Challenge，用于考察模型在自然语言理解、知识、数学计算和推理、代码生成等任务上的能力。Yuan2.0-M32模型在所有测评任务上全面超越了Llama3-8B、Mistral-8*7B等模型，综合能力表现可以对标 Llama3-70B模型。



	\| Model \| HumanEval \| GSM8K \| MMLU \| Math \| ARC-C\* \|
	\| ------------------ \| :---------------: \| :------------: \| :---------------: \| :---------------: \| :---------------:\|
	\| Llama3-70B \| 81.7% \| 93% \| 80.3 \| 50.4% \| 93.3% \|
	\| Llama3-8B \| 62.2% \| 79.6% \| 68.4% \| 30% \| 78.6% \|
	\| Phi-3-medium \| 62.2% \| 91.0% \| 78.0% \| - \| 91.6% \|
	\| Phi-3-small \| 61% \| 89.6% \| 75.7% \| - \| 90.7% \|
	\| Phi-3-mini \| 58.5% \| 82.5% \| 68.8% \| - \| 84.9% \|
	\| Mistral-8*22B \| 45.1% \| 78.6% \| 77.8% \| 41,8% \| 91.3% \|
	\| Mistral-8*7B \| 40.2% \| 58.4% \| 70.86% \| 28.4% \| 85.9% \|
	\| Yuan2.0-M32 \| 74.4% \| 92.5% \| 72.2% \| 55.9% \| 95.8% \|


	\* __ARC-C__：ARC-Challenge， ARC数据集中的高阶测试问题，需要深层的推理能力和更广泛的知识背景。

	-----

	3.2 模型算力效率

	\| Model \| Params (B) \| Active Params (B) \| GFLOPs/token (Inference) \| GFLOPs/token (Fine-tune) \| Mean Accuracy \| Mean Accuracy GFLOPs per token (Inference) \|
	\| ------------------ \| :---------------: \| :------------: \| :---------------: \| :---------------: \| :---------------:\|:---------------:\|
	\| \| 参数量 \| 激活参数量 \| 算力消耗/token （推理阶段） \| 算力消耗/token （微调阶段） \| 平均测评分数 \| 模型算力效率 \|
	\| Llama3-70B \| 70 \| 70 \| 140 \| 420 \| 79.25 \| 0.57 \|
	\| Llama3-8B \| 8 \| 8 \| 16 \| 48 \| 64.15 \| 4.00 \|
	\| Mistral-8*22B \| 141 \| 39 \| 78 \| 234 \| 72.38 \| 0.93 \|
	\| Mistral-8*7B \| 47 \| 129 \| 25.8 \| 77,3 \| 60.83 \| 2.36 \|
	\| Yuan2.0-M32 \| 40 \| 3.7 \| 7.4 \| 22.2 \| 79.15 \| 10.69 \|






	## 4. Quick Start


	4.1 环境配置

	我们建议使用yuan2.0-M32的最新docker[镜像文件](https://hub.docker.com/r/yuanmodel/yuan2.0:m32).

	我们可以通过下面命令启动容器：

	```bash
	docker pull yuanmodel/yuan2.0:V1-base
	docker run --gpus all --privileged --ulimit stack=68719476736 --shm-size=1000G -itd -v /path/to/yuan_2.0:/workspace/yuan_2.0 -v /path/to/dataset:/workspace/dataset -v /path/to/checkpoints:/workspace/checkpoints --name your_name yuanmodel/yuan2.0:V1-base
	docker exec -it your_name bash
	```


	4.2 数据预处理

	我们提供了数据预处理的脚本，参考[数据预处理说明文档](./docs/data_process.md).

	4.3 模型预训练

	我们提供了用于预训练的文档和 [`example`](./examples)的脚本，具体使用方法可以参考[预训练说明文档](./docs/pretrain.md).

	4.4 推理服务

	-详细部署方案可以参考[vllm](https://github.com/IEIT-Yuan/Yuan2.0-M32/edit/main/vllm/README_Yuan_vllm.md)


	## 5. Statement of Agreement

	使用源2.0代码及模型需遵循 [Apache 2.0](https://github.com/xxxxxxE) 开源协议和[《源2.0模型许可协议》](./LICENSE-Yuan)，源2.0模型支持商用，不需要申请授权，请您了解并遵循，勿将开源模型和代码及基于开源项目产生的衍生物用于任何可能给国家和社会带来危害的用途以及用于任何未经过安全评估和备案的服务。

	尽管模型在训练时我们已采取措施尽力确保数据的合规性和准确性，但模型参数量巨大且受概率随机性因素影响，我们无法保证输出内容的准确性，且模型易被输入指令所误导，本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。您将对通过使用、复制、分发和修改模型等方式利用该开源项目所产生的风险与后果，独自承担全部责任。