Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference
abhi-mosaic commited on
Commit
b1bd9bc
1 Parent(s): 7ce5792

update README

Browse files
Files changed (1) hide show
  1. README.md +21 -17
README.md CHANGED
@@ -72,37 +72,41 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
72
  trust_remote_code=True
73
  )
74
  ```
75
- Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
76
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
77
  `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
78
 
79
- To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
80
  ```python
81
- config = transformers.AutoConfig.from_pretrained(
82
- 'mosaicml/mpt-7b-chat',
83
- trust_remote_code=True
84
- )
 
 
85
  config.attn_config['attn_impl'] = 'triton'
 
86
 
87
  model = transformers.AutoModelForCausalLM.from_pretrained(
88
- 'mosaicml/mpt-7b-chat',
89
  config=config,
90
- torch_dtype=torch.bfloat16,
91
  trust_remote_code=True
92
  )
93
- model.to(device='cuda:0')
94
  ```
95
 
96
  Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
97
 
98
  ```python
99
- config = transformers.AutoConfig.from_pretrained(
100
- 'mosaicml/mpt-7b-chat',
101
- trust_remote_code=True
102
- )
103
- config.update({"max_seq_len": 4096})
 
 
104
  model = transformers.AutoModelForCausalLM.from_pretrained(
105
- 'mosaicml/mpt-7b-chat',
106
  config=config,
107
  trust_remote_code=True
108
  )
@@ -163,11 +167,11 @@ Please cite this model using the following format:
163
  ```
164
  @online{MosaicML2023Introducing,
165
  author = {MosaicML NLP Team},
166
- title = {Introducing MPT-7B: A New Standard for Open-Source,
167
  ly Usable LLMs},
168
  year = {2023},
169
  url = {www.mosaicml.com/blog/mpt-7b},
170
  note = {Accessed: 2023-03-28}, % change this date
171
  urldate = {2023-03-28} % change this date
172
  }
173
- ```
 
72
  trust_remote_code=True
73
  )
74
  ```
75
+ Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
76
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
77
  `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
78
 
79
+ To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model on GPU (`cuda:0`) with `attn_impl='triton'` and with `bfloat16` precision:
80
  ```python
81
+ import torch
82
+ import transformers
83
+
84
+ name = 'mosaicml/mpt-7b-chat'
85
+
86
+ config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
87
  config.attn_config['attn_impl'] = 'triton'
88
+ config.init_device = 'cuda:0' # For fast initialization directly on GPU!
89
 
90
  model = transformers.AutoModelForCausalLM.from_pretrained(
91
+ name,
92
  config=config,
93
+ torch_dtype=torch.bfloat16, # Load model weights in bfloat16
94
  trust_remote_code=True
95
  )
 
96
  ```
97
 
98
  Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
99
 
100
  ```python
101
+ import transformers
102
+
103
+ name = 'mosaicml/mpt-7b-chat'
104
+
105
+ config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
106
+ config.max_seq_len = 4096 # (input + output) tokens can now be up to 4096
107
+
108
  model = transformers.AutoModelForCausalLM.from_pretrained(
109
+ name,
110
  config=config,
111
  trust_remote_code=True
112
  )
 
167
  ```
168
  @online{MosaicML2023Introducing,
169
  author = {MosaicML NLP Team},
170
+ title = {Introducing MPT-7B: A New Standard for Open-Source,
171
  ly Usable LLMs},
172
  year = {2023},
173
  url = {www.mosaicml.com/blog/mpt-7b},
174
  note = {Accessed: 2023-03-28}, % change this date
175
  urldate = {2023-03-28} % change this date
176
  }
177
+ ```