abhi-mosaic
commited on
Commit
•
b1bd9bc
1
Parent(s):
7ce5792
update README
Browse files
README.md
CHANGED
@@ -72,37 +72,41 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
|
|
72 |
trust_remote_code=True
|
73 |
)
|
74 |
```
|
75 |
-
Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
|
76 |
This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
|
77 |
`MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
|
78 |
|
79 |
-
To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and
|
80 |
```python
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
|
|
|
|
85 |
config.attn_config['attn_impl'] = 'triton'
|
|
|
86 |
|
87 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
88 |
-
|
89 |
config=config,
|
90 |
-
torch_dtype=torch.bfloat16,
|
91 |
trust_remote_code=True
|
92 |
)
|
93 |
-
model.to(device='cuda:0')
|
94 |
```
|
95 |
|
96 |
Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
|
97 |
|
98 |
```python
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
config.
|
|
|
|
|
104 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
105 |
-
|
106 |
config=config,
|
107 |
trust_remote_code=True
|
108 |
)
|
@@ -163,11 +167,11 @@ Please cite this model using the following format:
|
|
163 |
```
|
164 |
@online{MosaicML2023Introducing,
|
165 |
author = {MosaicML NLP Team},
|
166 |
-
title = {Introducing MPT-7B: A New Standard for Open-Source,
|
167 |
ly Usable LLMs},
|
168 |
year = {2023},
|
169 |
url = {www.mosaicml.com/blog/mpt-7b},
|
170 |
note = {Accessed: 2023-03-28}, % change this date
|
171 |
urldate = {2023-03-28} % change this date
|
172 |
}
|
173 |
-
```
|
|
|
72 |
trust_remote_code=True
|
73 |
)
|
74 |
```
|
75 |
+
Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
|
76 |
This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
|
77 |
`MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
|
78 |
|
79 |
+
To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model on GPU (`cuda:0`) with `attn_impl='triton'` and with `bfloat16` precision:
|
80 |
```python
|
81 |
+
import torch
|
82 |
+
import transformers
|
83 |
+
|
84 |
+
name = 'mosaicml/mpt-7b-chat'
|
85 |
+
|
86 |
+
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
87 |
config.attn_config['attn_impl'] = 'triton'
|
88 |
+
config.init_device = 'cuda:0' # For fast initialization directly on GPU!
|
89 |
|
90 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
91 |
+
name,
|
92 |
config=config,
|
93 |
+
torch_dtype=torch.bfloat16, # Load model weights in bfloat16
|
94 |
trust_remote_code=True
|
95 |
)
|
|
|
96 |
```
|
97 |
|
98 |
Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
|
99 |
|
100 |
```python
|
101 |
+
import transformers
|
102 |
+
|
103 |
+
name = 'mosaicml/mpt-7b-chat'
|
104 |
+
|
105 |
+
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
106 |
+
config.max_seq_len = 4096 # (input + output) tokens can now be up to 4096
|
107 |
+
|
108 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
109 |
+
name,
|
110 |
config=config,
|
111 |
trust_remote_code=True
|
112 |
)
|
|
|
167 |
```
|
168 |
@online{MosaicML2023Introducing,
|
169 |
author = {MosaicML NLP Team},
|
170 |
+
title = {Introducing MPT-7B: A New Standard for Open-Source,
|
171 |
ly Usable LLMs},
|
172 |
year = {2023},
|
173 |
url = {www.mosaicml.com/blog/mpt-7b},
|
174 |
note = {Accessed: 2023-03-28}, % change this date
|
175 |
urldate = {2023-03-28} % change this date
|
176 |
}
|
177 |
+
```
|