minimal example with flash attention

#28

by keepitsane - opened Feb 8

Feb 8

Hello just curious if there was a minimal example demonstrating how to use flash attention. Thanks in advance!

Owner Feb 14

Please refer to the documentation at https://huggingface.co/docs/transformers/main/en/model_doc/mistral#combining-mistral-and-flash-attention-2 , simply add attn_implementation="flash_attention_2" to AutoModel.from_pretrained(...) will do the trick. Note that it requires A100 or newer GPUs.

keepitsane changed discussion status to closed Feb 14

Feb 18

FlashAttn is now added via torch.sdpa if you use torch>=2.2 and transformers>=4.37.1

Best Michael Feil

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment