Flash Attention: request

#5
by BFquocanh - opened

ValueError: InternVLChatModel does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new
I know we can change flash_attn_2 to eager to remove the need to install flash_attn. Can someone implement flash_attn to this repo? Would love to boost speed

OpenGVLab org

Thank you for your feedback. Now that flash attention is enabled for Phi3, eager attention is automatically used if flash attention is not installed in the environment.

czczup changed discussion status to closed

Sign up or log in to comment