Text Generation
Transformers
PyTorch
mosaic_gpt
custom_code
daking commited on
Commit
ee5674f
1 Parent(s): 7d303f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -8,7 +8,7 @@ datasets:
8
 
9
  Mosaic-1b-RedPajama-200b is a 1.4 billion parameter decoder-only transformer trained on the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T).
10
  The model was trained for 200B tokens by sampling from the subsets of the RedPajama dataset in the same proportions as were used by the [Llama series of models](https://arxiv.org/abs/2302.13971).
11
- This model was trained by [MosaicML](https://www.mosaicml.com) and follows the a modified decoder-only transformer architecture.
12
 
13
  ## Model Date
14
 
@@ -24,6 +24,12 @@ import transformers
24
  model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-llama-redpajama-final-candidate', trust_remote_code=True)```
25
  ```
26
 
 
 
 
 
 
 
27
  ## Model Description
28
 
29
  This model uses the MosaicML LLM codebase, which can be found in the [MosaicML Examples Repository](https://github.com/mosaicml/examples/tree/v0.0.4/examples/llm).
 
8
 
9
  Mosaic-1b-RedPajama-200b is a 1.4 billion parameter decoder-only transformer trained on the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T).
10
  The model was trained for 200B tokens by sampling from the subsets of the RedPajama dataset in the same proportions as were used by the [Llama series of models](https://arxiv.org/abs/2302.13971).
11
+ This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
12
 
13
  ## Model Date
14
 
 
24
  model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-llama-redpajama-final-candidate', trust_remote_code=True)```
25
  ```
26
 
27
+ To use the optimized triton implementation of FlashAttention, you can load with `attn_impl='triton'` and move the model to `bfloat16` like so:
28
+ ```python
29
+ model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-1b-redpajama-200b', trust_remote_code=True, attn_impl='triton')
30
+ model.to(device='cuda:0', dtype=torch.bfloat16)
31
+ ```
32
+
33
  ## Model Description
34
 
35
  This model uses the MosaicML LLM codebase, which can be found in the [MosaicML Examples Repository](https://github.com/mosaicml/examples/tree/v0.0.4/examples/llm).