openmoe-8b-400B / README.md

Update README.md

c600d72 verified 10 months ago

9.54 kB

	---
	license: apache-2.0
	---
	<p align="center">
	<img width="150px" alt="OpenMoE" src="https://github.com/XueFuzhao/OpenMoE/blob/main/logo.jpg?raw=true">
	</p>
	<p align="center"><a href="https://github.com/XueFuzhao/OpenMoE/tree/main">[Github]</a> \| <a href="https://colab.research.google.com/drive/1xIfIVafnlCP2XVICmRwkUFK3cwTJYjCY#scrollTo=62T-2mH_tsjG">[Colab Demo]</a> \| <a href="https://huggingface.co/OrionZheng">[Huggingface]</a> \| <a href="https://discord.gg/bjGnGfjegU">[Discord]</a> \| <a href="https://twitter.com/xuefz/status/1693696988611739947?s=61&t=Xc2k2W7vU_hlpNizGDCmOw">[Twitter]</a> \| <a href="https://xuefuzhao.notion.site/Aug-2023-OpenMoE-v0-2-Release-43808efc0f5845caa788f2db52021879">[Blog]</a></p>
	</p>
	<hr>

	# OpenMoE-8B(400B tokens)
	OpenMoE is a project aimed at igniting the open-source MoE community! We are releasing a family of open-sourced Mixture-of-Experts (MoE) Large Language Models.

	Our project began in the summer of 2023. On August 22, 2023, we released the first batch of intermediate checkpoints (OpenMoE-base&8B), along with the data and code [[Twitter]](https://twitter.com/xuefz/status/1693696988611739947?s=61&t=Xc2k2W7vU_hlpNizGDCmOw). Subsequently, the OpenMoE-8B training was completed in November, 2023. After that, we embarked on explorations on 34B scale model, which is still ongoing.

	As a small student team, instead of pursuing the best model with better data, computation, and human power, we devote to fully sharing our training data, strategies, model architecture, weights, and everything we have with the community. We hope this project will promote research on this promising field and invite more contributors to work on open-sourced MoE projects together!

	[2024.01.12] The paper for the project and more evaluations are underway. For more information about the model, training, and evaluations, please visit our GitHub [repository](https://github.com/XueFuzhao/OpenMoE/tree/main).


	## Model Weights
	Currently, three models are released in total: OpenMoE-base, OpenMoE-8B/8B-Chat, and OpenMoE-34B(at 200B tokens).

	The table below lists the 8B/8B-Chat model that has completed training on 1.1T tokens.

	\| Model Name \| Description \| #Param \|Huggingface \|
	\|----------------\|-------------------------------------------------\|----------\|-------------\|
	\| OpenMoE-8B(1.1T) \| 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) \|8B \|[Link](https://huggingface.co/OrionZheng/openmoe-8b) \|
	\| OpenMoE-8B-Chat (1.1T+SFT) \| OpenMoE-8B-1.1T supervised finetuned on the [WildChat GPT-4 Subset](https://huggingface.co/datasets/allenai/WildChat-nontoxic) \|8B \|[Link](https://huggingface.co/OrionZheng/openmoe-8b-chat) \|


	Besides, we also provide all our intermediate checkpoints(base, 8B, 34B) for research purposes.

	\| Model Name \| Description \| #Param \|Huggingface \|
	\|----------------\|-------------------------------------------------\|----------\|-------------\|
	\| OpenMoE-34B-200B \| 34B MoE with comparable FLOPs of a 7B LLaMA(No SFT) \|34B \|[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) \|
	\| OpenMoE-8B-200B \| 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) \|8B \|[Link](https://huggingface.co/OrionZheng/openmoe-8b-200B) \|
	\| OpenMoE-8B-400B \| 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) \|8B \|[Link](https://huggingface.co/OrionZheng/openmoe-8b-400B) \|
	\| OpenMoE-8B-600B \| 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) \|8B \|[Link](https://huggingface.co/OrionZheng/openmoe-8b-600B) \|
	\| OpenMoE-8B-800B \| 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) \|8B \|[Link](https://huggingface.co/OrionZheng/openmoe-8b-800B) \|
	\| OpenMoE-8B-1T \| 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) \|8B \|[Link](https://huggingface.co/OrionZheng/openmoe-8b-1T) \|
	\| OpenMoE-base(128B) \| A small MoE model for debugging only \|637M \|[Link](https://huggingface.co/OrionZheng/openmoe-base) \|
	\| OpenLLaMA-base(128B) \| A dense counter-part of OpenMoE-base \|310M \|[Link](https://huggingface.co/fuzhao/OpenLLaMA_Base) \|


	The base model, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications. Better performence can be oberved from our 8B or 34B versions.

	The OpenMoE-8B with 4 MoE layers and 32 experts has been trained by 1.1T tokens. The SFT version has also been released after we finetuned the OpenMoE-8B-1.1T on the [wildchat]((https://huggingface.co/datasets/allenai/WildChat-nontoxic)) dataset's GPT-4 subset. The intermediate checkpoints at 200B, 400B, 600B, 800B, 1T tokens can be used to study the training dynamics of MoE architexture.

	We are still training our OpenMoE-34B, which is a MoE model with 8 MoE layer and 32 experts. We released the intermediate checkpoint trained on 200B tokens on huggingface. If you are interested in the latest checkpoint, please feel free to drop Fuzhao an email (f.xue@u.nus.edu).

	## Get Started

	### Inference with Pytorch
	Our PyToch implementation is supported by [Colossal AI](https://github.com/hpcaitech/ColossalAI). You can install our forked version directly for easier setup:
	```
	# Python version: 3.10.12
	# Install ColossalAI
	git clone --branch my_openmoe https://github.com/Orion-Zheng/ColossalAI.git
	pip install ./ColossalAI
	python -m pip install -r ./ColossalAI/examples/language/openmoe/requirements.txt
	```

	Then, you can inference by the following code on a A100 80GB machine.
	```
	from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

	model_path = "ckpts/openmoe-8b-chat"
	config = AutoConfig.from_pretrained(model_path)
	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	device_map='auto'
	)
	query = 'Question: How do I kill a process? Answer:'
	prompt = f'''<<SYS>>
	You are a helpful, respectful and honest assistant.
	<</SYS>>

	<s>[INST] {query} [/INST]'''

	inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
	sample = model.generate(**inputs, max_new_tokens=32)
	print(tokenizer.decode(sample[0]))
	```


	If you don't have GPUs on your hand, don't worry! you can still experience our model on Colab(Note: this require a $10 Colab Pro Plan). You can experiment with OpenMoE-8B-Chat on Colab directly by [this](https://colab.research.google.com/drive/1xIfIVafnlCP2XVICmRwkUFK3cwTJYjCY).
	- Running OpenMoE-8B requires ~49GB of memory in float32 or ~23GB in bfloat16. It can be executed on a Colab `CPU High-RAM`(in float32) runtime or an `A100-40GB`(in bfloat16) runtime, both of which require Colab Pro. The float16 precision is not recommended because sometimes it will lead to performance degradation.
	- Runing the OpenMoE-34B requries ~89GB of memory in bfloat16 or ~180GB in float32. To perform inference on multiple devices/offloading model weights to RAM, please refer to the script [here](https://github.com/XueFuzhao/OpenMoE/blob/main/script/inference_on_multi_devices.py).
	- A more detailed env setup script can be found [here](https://github.com/XueFuzhao/OpenMoE/blob/main/env/prepare_env.sh), or if you use docker, you can refer to the dockerfile [here](https://github.com/XueFuzhao/OpenMoE/blob/main/env/openmoe_infer_dockerfile). Note: you don't need t5x and Jax dependency if you are using our [huggingface ckpts](https://huggingface.co/OrionZheng/openmoe-8b-chat) without converting the jax checkpoints.

	Besides, we also provide a Colab [tutorial](https://colab.research.google.com/drive/1eIT1rtG7pORRQAYtQoMOAekUg7aZLDdn) demonstrating the jax checkpoint conversion.


	## License

	Our code is under Apache 2.0 License.

	Since the models are trained on The Redpajama and The Stack dataset, please check the license of these two datasets for your model usage.


	## Authors

	This project is currently contributed by the following authors:

	[Fuzhao Xue](https://xuefuzhao.github.io/), [Zian Zheng](https://zheng-zian-andy.com), [Yao Fu](https://franxyao.github.io/), [Jinjie Ni](http://jinjie.one/), [Zangwei Zheng](https://zhengzangw.github.io/), [Wangchunshu Zhou](https://michaelzhouwang.github.io/), [Yang You](https://www.comp.nus.edu.sg/~youy/)

	## Acknowledgement
	The computational resources for this project were generously provided by the [Google TPU Research Cloud(TRC)](https://sites.research.google/trc/about/). We extend our heartfelt thanks to TRC for their invaluable support, which has been fundamental to the success of our work. Besides, we are extremely grateful to the [ColossalAI Team](https://github.com/hpcaitech/ColossalAI) for their tremendous support with the PyTorch implementation, especially [Xuanlei Zhao](https://oahzxl.github.io/) and [Wenhao Chen](https://github.com/CWHer), making training and inference of OpenMoE on GPUs a reality.

	## Citation

	Please cite the repo if you use the model and code in this repo.

	```bibtex
	@misc{openmoe2023,
	author = {Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou and Yang You},
	title = {OpenMoE: Open Mixture-of-Experts Language Models},
	year = {2023},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/XueFuzhao/OpenMoE}},
	}
	```