jacobfulano commited on
Commit
8a9076d
1 Parent(s): ed2a544

Update detail about Triton Flash Attention with ALiBi implementation

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -64,6 +64,12 @@ This simply presets the non-learned linear bias matrix in every attention block
64
 
65
  **To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/benchmarks/bert repo](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert#fine-tuning).
66
 
 
 
 
 
 
 
67
  ### Remote Code
68
 
69
  This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example:
 
64
 
65
  **To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/benchmarks/bert repo](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert#fine-tuning).
66
 
67
+ ### [Update 1/2/2024] Triton Flash Attention with ALiBi
68
+
69
+ Note that by default, triton Flash Attention is **not** enabled or required. In order to enable our custom implementation of triton Flash Attention with ALiBi from March 2023,
70
+ set `attention_probs_dropout_prob: 0.0`. We are currently working on supporting Flash Attention 2 (see [PR here](https://github.com/mosaicml/examples/pull/440)) and replacing the custom triton implementation.
71
+
72
+
73
  ### Remote Code
74
 
75
  This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example: