jacksonkarel
/

selfmodifai-mpt-7b

Text Generation

text-generation-inference

Model card Files Files and versions Community

jacobfulano commited on May 5, 2023

Commit

e27b4b2

•

1 Parent(s): ee3acd5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -79,7 +79,7 @@ Note: This model requires that `trust_remote_code=True` be passed to the `from_p
 This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
 `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
-To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention (`pip install flash_attn`), you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
 ```python
 config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b', trust_remote_code=True)
 config.attn_config['attn_impl'] = 'triton'

 This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
 `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
+To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
 ```python
 config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b', trust_remote_code=True)
 config.attn_config['attn_impl'] = 'triton'