--- license: cc-by-sa-3.0 datasets: - mosaicml/dolly_hhrlhf tags: - Composer - MosaicML - llm-foundry --- # MPT-7B-Instruct MPT-7B-Instruct is a model for short-form instruction following. It is built by finetuning [MPT-7B (Base)](https://huggingface.co/spaces/mosaicml/mpt-7b) on a [dataset](https://huggingface.co/datasets/sam-mosaic/dolly_hhrlhf) derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets. * License: _CC-By-SA-3.0_ (commercial use permitted) * [Online Demo](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct) This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture. ## Model Date May 5, 2023 ## Model License Apache-2.0 (commercial use permitted) ## Documentation * [Blog post: Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs](www.mosaicml.com/blog/mpt-7b) * [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/) * Questions: Feel free to contact us via the [MosaicML Community Slack](https://join.slack.com/t/mosaicml-community/shared_invite/zt-w0tiddn9-WGTlRpfjcO9J5jyrMub1dg)! ### Example Question/Instruction **Longboi24** > What is a quoll? **MPT-7B-Instruct** >A Quoll (pronounced “cool”) is one of Australia’s native carnivorous marsupial mammals, which are also known as macropods or wallabies in other parts around Asia and South America ## How to Use Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we use a custom model architecture that is not yet part of the `transformers` package. It includes options for many training efficiency features such as [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), QK LayerNorm, and more. ```python import transformers model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-instruct', trust_remote_code=True, torch_dtype=torch.bfloat16) ``` To use the optimized triton implementation of FlashAttention, you can load with `attn_impl='triton'` and move the model to `bfloat16` like so: ```python model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-instruct', trust_remote_code=True, torch_dtype=torch.bfloat16, attn_impl='triton') model.to(device='cuda:0', dtype=torch.bfloat16) ``` Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example: ```python config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b', trust_remote_code=True) config.update({"max_seq_len": 4096}) model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b', config=config, trust_remote_code=True) ``` This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer. ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") ``` ## Model Description The architecture is a modification of a standard decoder-only transformer. The model has been modified from a standard transformer in the following ways: * It uses [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) * It uses [ALiBi (Attention with Linear Biases)](https://arxiv.org/abs/2108.12409) and does not use positional embeddings * It does not use biases | Hyperparameter | Value | |----------------|-------| |n_parameters | 6.7B | |n_layers | 32 | | n_heads | 32 | | d_model | 4096 | | vocab size | 50432 | | sequence length | 2048 | ## PreTraining Data For more details on the pretraining process, see [MPT-7B](https://huggingface.co/mosaicml/mpt-7b). The data was tokenized using the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer. ## Training Configuration This model was finetuned on 440 A100-40GBs for about half a day using the [MosaicML Platform](https://www.mosaicml.com/platform). ## Acknowledgements This model was finetuned by Sam Havens and the MosaicML NLP team