Flash Attention: request

by BFquocanh - opened Jul 1

Jul 1

ValueError: InternVLChatModel does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new
I know we can change flash_attn_2 to eager to remove the need to install flash_attn. Can someone implement flash_attn to this repo? Would love to boost speed

czczup

OpenGVLab org Jul 7

Thank you for your feedback. Now that flash attention is enabled for Phi3, eager attention is automatically used if flash attention is not installed in the environment.

czczup changed discussion status to closed 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment