Why navit version slower than normal version?

by VictorSanh HF staff - opened Feb 20

HuggingFaceM4 org Feb 20

Reposting this discussion from @yuzaa because I deleted the debug repo they create this question from:

I found that the forward speed of the navit version is twice as slow at the same resolution. (GPU: A800)

import torch
from transformers import AutoModel
base = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-384-flash-attn2", trust_remote_code=True)
navit = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit", trust_remote_code=True)

base_vision = base.vision_model
base_vision.bfloat16().eval().cuda()

navit_vision = navit.vision_model
navit_vision.bfloat16().eval().cuda()

pixel_values = torch.randn(1, 3, 384, 384).bfloat16().cuda()

# %%time
for i in range(100):
    x = base_vision(pixel_values)

# CPU times: user 1.21 s, sys: 12.4 ms, total: 1.22 s
# Wall time: 1.22 s


# %%time
for i in range(100):
    x = navit_vision(pixel_values)

# CPU times: user 2.63 s, sys: 36.3 ms, total: 2.66 s
# Wall time: 2.66 s

VictorSanh

HuggingFaceM4 org Feb 20

I don't quite know yet, will dig in this week. there should not be such a speed overhead...

Note that this model has not been trained yet after the position embedding have been interpolated and the navit style handling of images introduced.

VictorSanh

HuggingFaceM4 org Mar 7

•

edited Mar 7

So it looks like the flash_attn_varlen_func and flash_attn_func paths of flash attention 2 (one requires an attention mask, the other one is the behavior when no attention mask is passed) have different speeds.
the call to _upad_input is expensive when passing the attention_mask
i am fixing this now

VictorSanh

HuggingFaceM4 org Mar 7

•

edited Mar 7

fixed in https://huggingface.co/HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit/commit/d66538faeba44480d0bfaa42145eef26f9423199

thanks for reporting!

VictorSanh changed discussion status to closed Mar 26

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment