jfrankle commited on
Commit
d830485
1 Parent(s): b85bb66

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -7
README.md CHANGED
@@ -17,7 +17,7 @@ inference: false
17
  # MPT-7B
18
 
19
  MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code.
20
- This model was trained by [MosaicML](https://www.mosaicml.com) and is **open-sourced for commercial use** (_Apache-2.0_).
21
 
22
  MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
23
 
@@ -32,7 +32,7 @@ This model uses the MosaicML LLM codebase, which can be found in the [llm-foundr
32
 
33
  MPT-7B is
34
 
35
- * **Licensed for commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
36
  * **Trained on a large amount of data** (1T tokens like [LLaMA](https://arxiv.org/abs/2302.13971) vs. 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
37
  * **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409) (we finetuned [MPT-7B-StoryWriter-65k+](https://huggingface.co/mosaicml/mpt-7b-storywriter) on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models).
38
  * **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
@@ -46,17 +46,17 @@ The following models are finetuned on MPT-7B:
46
  Built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
47
  At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
48
  We demonstrate generations as long as 80k tokens on a single A100-80GB GPU in our [blogpost](www.mosaicml.com/blog/mpt-7b).
49
- * License: Creative Commons Attribution Non Commercial 4.0
50
 
51
  * [MPT-7B-Instruct](https://huggingface.co/mosaicml/mpt-7b-instruct): a model for short-form instruction following.
52
  Built by finetuning MPT-7B on a [dataset](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) we also release, derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets.
53
- * License: _CC-By-SA-3.0_ (commercial use permitted)
54
  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct)
55
 
56
  * [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat): a chatbot-like model for dialogue generation.
57
  Built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3),
58
  [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets.
59
- * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
60
  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-chat)
61
 
62
  ## Model Date
@@ -65,7 +65,7 @@ May 5, 2023
65
 
66
  ## Model License
67
 
68
- Apache-2.0 (commercial use permitted)
69
 
70
  ## Documentation
71
 
@@ -207,6 +207,10 @@ While great efforts have been taken to clean the pretraining data, it is possibl
207
 
208
  If you're interested in [training](https://www.mosaicml.com/training) and [deploying](https://www.mosaicml.com/inference) your own MPT or LLMs on the MosaicML Platform, [sign up here](https://forms.mosaicml.com/demo?utm_source=huggingface&utm_medium=referral&utm_campaign=mpt-7b).
209
 
 
 
 
 
210
  ## Citation
211
 
212
  Please cite this model using the following format:
@@ -214,7 +218,8 @@ Please cite this model using the following format:
214
  ```
215
  @online{MosaicML2023Introducing,
216
  author = {MosaicML NLP Team},
217
- title = {Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs},
 
218
  year = {2023},
219
  url = {www.mosaicml.com/blog/mpt-7b},
220
  note = {Accessed: 2023-03-28}, % change this date
 
17
  # MPT-7B
18
 
19
  MPT-7B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code.
20
+ This model was trained by [MosaicML](https://www.mosaicml.com).
21
 
22
  MPT-7B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
23
 
 
32
 
33
  MPT-7B is
34
 
35
+ * **Licensed for the possibility of commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
36
  * **Trained on a large amount of data** (1T tokens like [LLaMA](https://arxiv.org/abs/2302.13971) vs. 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
37
  * **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409) (we finetuned [MPT-7B-StoryWriter-65k+](https://huggingface.co/mosaicml/mpt-7b-storywriter) on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models).
38
  * **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
 
46
  Built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
47
  At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
48
  We demonstrate generations as long as 80k tokens on a single A100-80GB GPU in our [blogpost](www.mosaicml.com/blog/mpt-7b).
49
+ * License: Apache 2.0
50
 
51
  * [MPT-7B-Instruct](https://huggingface.co/mosaicml/mpt-7b-instruct): a model for short-form instruction following.
52
  Built by finetuning MPT-7B on a [dataset](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) we also release, derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets.
53
+ * License: _CC-By-SA-3.0_
54
  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct)
55
 
56
  * [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat): a chatbot-like model for dialogue generation.
57
  Built by finetuning MPT-7B on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3),
58
  [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets.
59
+ * License: _CC-By-NC-SA-4.0_
60
  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-chat)
61
 
62
  ## Model Date
 
65
 
66
  ## Model License
67
 
68
+ Apache-2.0
69
 
70
  ## Documentation
71
 
 
207
 
208
  If you're interested in [training](https://www.mosaicml.com/training) and [deploying](https://www.mosaicml.com/inference) your own MPT or LLMs on the MosaicML Platform, [sign up here](https://forms.mosaicml.com/demo?utm_source=huggingface&utm_medium=referral&utm_campaign=mpt-7b).
209
 
210
+ ## Disclaimer
211
+
212
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
213
+
214
  ## Citation
215
 
216
  Please cite this model using the following format:
 
218
  ```
219
  @online{MosaicML2023Introducing,
220
  author = {MosaicML NLP Team},
221
+ title = {Introducing MPT-7B: A New Standard for Open-Source,
222
+ ly Usable LLMs},
223
  year = {2023},
224
  url = {www.mosaicml.com/blog/mpt-7b},
225
  note = {Accessed: 2023-03-28}, % change this date