sam-mosaic commited on
Commit
19180fa
1 Parent(s): c75058f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -32,7 +32,7 @@ This model uses the MosaicML LLM codebase, which can be found in the [llm-foundr
32
 
33
  MPT-7B-8k is
34
 
35
- * **Licensed for the possibility of commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
36
  * **Trained on a large amount of data** (1.5T tokens like [XGen](https://huggingface.co/Salesforce/xgen-7b-8k-base) vs. 1T for [LLaMA](https://arxiv.org/abs/2302.13971), 1T for [MPT-7B](https://www.mosaicml.com/blog/mpt-7b), 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
37
  * **Prepared to handle long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409). With ALiBi, the model can extrapolate beyond the 8k training sequence length to up to 10k, and with a few million tokens it can be finetuned to extrapolate much further.
38
  * **Capable of fast training and inference** via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer)
 
32
 
33
  MPT-7B-8k is
34
 
35
+ * **Licensed for the possibility of commercial use.**
36
  * **Trained on a large amount of data** (1.5T tokens like [XGen](https://huggingface.co/Salesforce/xgen-7b-8k-base) vs. 1T for [LLaMA](https://arxiv.org/abs/2302.13971), 1T for [MPT-7B](https://www.mosaicml.com/blog/mpt-7b), 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
37
  * **Prepared to handle long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409). With ALiBi, the model can extrapolate beyond the 8k training sequence length to up to 10k, and with a few million tokens it can be finetuned to extrapolate much further.
38
  * **Capable of fast training and inference** via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer)