Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ inference: false
|
|
17 |
# MPT-7B
|
18 |
|
19 |
MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code.
|
20 |
-
This model was trained by [MosaicML](https://www.mosaicml.com)
|
21 |
|
22 |
MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
|
23 |
|
@@ -32,7 +32,7 @@ This model uses the MosaicML LLM codebase, which can be found in the [llm-foundr
|
|
32 |
|
33 |
MPT-7B is
|
34 |
|
35 |
-
* **Licensed for commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
|
36 |
* **Trained on a large amount of data** (1T tokens like [LLaMA](https://arxiv.org/abs/2302.13971) vs. 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
|
37 |
* **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409) (we finetuned [MPT-7B-StoryWriter-65k+](https://huggingface.co/mosaicml/mpt-7b-storywriter) on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models).
|
38 |
* **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
|
@@ -46,17 +46,17 @@ The following models are finetuned on MPT-7B:
|
|
46 |
Built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
|
47 |
At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
|
48 |
We demonstrate generations as long as 80k tokens on a single A100-80GB GPU in our [blogpost](www.mosaicml.com/blog/mpt-7b).
|
49 |
-
* License:
|
50 |
|
51 |
* [MPT-7B-Instruct](https://huggingface.co/mosaicml/mpt-7b-instruct): a model for short-form instruction following.
|
52 |
Built by finetuning MPT-7B on a [dataset](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) we also release, derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets.
|
53 |
-
* License: _CC-By-SA-3.0_
|
54 |
* [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct)
|
55 |
|
56 |
* [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat): a chatbot-like model for dialogue generation.
|
57 |
Built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3),
|
58 |
[Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets.
|
59 |
-
* License: _CC-By-NC-SA-4.0_
|
60 |
* [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-chat)
|
61 |
|
62 |
## Model Date
|
@@ -65,7 +65,7 @@ May 5, 2023
|
|
65 |
|
66 |
## Model License
|
67 |
|
68 |
-
Apache-2.0
|
69 |
|
70 |
## Documentation
|
71 |
|
@@ -207,6 +207,10 @@ While great efforts have been taken to clean the pretraining data, it is possibl
|
|
207 |
|
208 |
If you're interested in [training](https://www.mosaicml.com/training) and [deploying](https://www.mosaicml.com/inference) your own MPT or LLMs on the MosaicML Platform, [sign up here](https://forms.mosaicml.com/demo?utm_source=huggingface&utm_medium=referral&utm_campaign=mpt-7b).
|
209 |
|
|
|
|
|
|
|
|
|
210 |
## Citation
|
211 |
|
212 |
Please cite this model using the following format:
|
@@ -214,7 +218,8 @@ Please cite this model using the following format:
|
|
214 |
```
|
215 |
@online{MosaicML2023Introducing,
|
216 |
author = {MosaicML NLP Team},
|
217 |
-
title = {Introducing MPT-7B: A New Standard for Open-Source,
|
|
|
218 |
year = {2023},
|
219 |
url = {www.mosaicml.com/blog/mpt-7b},
|
220 |
note = {Accessed: 2023-03-28}, % change this date
|
|
|
17 |
# MPT-7B
|
18 |
|
19 |
MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code.
|
20 |
+
This model was trained by [MosaicML](https://www.mosaicml.com).
|
21 |
|
22 |
MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
|
23 |
|
|
|
32 |
|
33 |
MPT-7B is
|
34 |
|
35 |
+
* **Licensed for the possibility of commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
|
36 |
* **Trained on a large amount of data** (1T tokens like [LLaMA](https://arxiv.org/abs/2302.13971) vs. 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
|
37 |
* **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409) (we finetuned [MPT-7B-StoryWriter-65k+](https://huggingface.co/mosaicml/mpt-7b-storywriter) on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models).
|
38 |
* **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
|
|
|
46 |
Built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
|
47 |
At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
|
48 |
We demonstrate generations as long as 80k tokens on a single A100-80GB GPU in our [blogpost](www.mosaicml.com/blog/mpt-7b).
|
49 |
+
* License: Apache 2.0
|
50 |
|
51 |
* [MPT-7B-Instruct](https://huggingface.co/mosaicml/mpt-7b-instruct): a model for short-form instruction following.
|
52 |
Built by finetuning MPT-7B on a [dataset](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) we also release, derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets.
|
53 |
+
* License: _CC-By-SA-3.0_
|
54 |
* [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct)
|
55 |
|
56 |
* [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat): a chatbot-like model for dialogue generation.
|
57 |
Built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3),
|
58 |
[Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets.
|
59 |
+
* License: _CC-By-NC-SA-4.0_
|
60 |
* [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-chat)
|
61 |
|
62 |
## Model Date
|
|
|
65 |
|
66 |
## Model License
|
67 |
|
68 |
+
Apache-2.0
|
69 |
|
70 |
## Documentation
|
71 |
|
|
|
207 |
|
208 |
If you're interested in [training](https://www.mosaicml.com/training) and [deploying](https://www.mosaicml.com/inference) your own MPT or LLMs on the MosaicML Platform, [sign up here](https://forms.mosaicml.com/demo?utm_source=huggingface&utm_medium=referral&utm_campaign=mpt-7b).
|
209 |
|
210 |
+
## Disclaimer
|
211 |
+
|
212 |
+
The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
|
213 |
+
|
214 |
## Citation
|
215 |
|
216 |
Please cite this model using the following format:
|
|
|
218 |
```
|
219 |
@online{MosaicML2023Introducing,
|
220 |
author = {MosaicML NLP Team},
|
221 |
+
title = {Introducing MPT-7B: A New Standard for Open-Source,
|
222 |
+
ly Usable LLMs},
|
223 |
year = {2023},
|
224 |
url = {www.mosaicml.com/blog/mpt-7b},
|
225 |
note = {Accessed: 2023-03-28}, % change this date
|