minimal example with flash attention

#28
by keepitsane - opened

Hello just curious if there was a minimal example demonstrating how to use flash attention. Thanks in advance!

Please refer to the documentation at https://huggingface.co/docs/transformers/main/en/model_doc/mistral#combining-mistral-and-flash-attention-2 , simply add attn_implementation="flash_attention_2" to AutoModel.from_pretrained(...) will do the trick. Note that it requires A100 or newer GPUs.

keepitsane changed discussion status to closed

FlashAttn is now added via torch.sdpa if you use torch>=2.2 and transformers>=4.37.1

Best Michael Feil

Sign up or log in to comment