OpenGVLab/Mini-InternVL-Chat-2B-V1-5 · running model on a Tesla T4

May 27

Hello, can this model run on a Tesla T4? When I run this model on a T4, I get the following error: "RuntimeError: FlashAttention only supports Ampere GPUs or newer."

czczup

OpenGVLab org May 29

Would you please try again now? I have changed this part of the code to automatically switch to the original pytorch attention if there is no flash attention.

YaTharThShaRma999

May 29

•

edited May 29

@czczup I tried it on a Tesla T4 gpu and it does not seem to work?
This is the main error

RuntimeError: FlashAttention only supports Ampere GPUs or newer.

Idk, but this might be useful, i got this in the error

File ~/.cache/huggingface/modules/transformers_modules/OpenGVLab/Mini-InternVL-Chat-2B-V1-5/e29cbb875c3039de7d81258cb5efaf754bf7d42c/modeling_intern_vit.py:77, in FlashAttention.forward(self, qkv, key_padding_mask, causal, cu_seqlens, max_s, need_weights)
     74     max_s = seqlen
     75     cu_seqlens = torch.arange(0, (batch_size + 1) * seqlen, step=seqlen, dtype=torch.int32,
     76                               device=qkv.device)
---> 77     output = flash_attn_unpadded_qkvpacked_func(
     78         qkv, cu_seqlens, max_s, self.dropout_p if self.training else 0.0,
     79         softmax_scale=self.softmax_scale, causal=causal
     80     )
     81     output = rearrange(output, '(b s) ... -> b s ...', b=batch_size)
     82 else:

czczup

OpenGVLab org May 30

•

edited May 30

You can modify the config.py in your downloaded files.
Change "attn_implementation": "flash_attention_2" to "eager"

czczup

OpenGVLab org May 30

If you installed flash attention in the environment, try to uninstall it.

xiaoajie738

May 30

Thank you. I am now able to run it on T4.

YaTharThShaRma999

May 30

Yep same, thank you!