When torch.nn.functional.scaled_dot_product_attention calls _scaled_dot_product_attention_math, the model reports an error

#3
by Quasimodo0808 - opened

If the sdpa in visual.py::attention_fn_default() uses the math kernel, then its output is contiguous. The output is transposed(), and then view() is executed. https://huggingface.co/THUDM/cogvlm2-video-llama3-chat/blob/main/visual.py#L78 veiw() will report an error

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

img_v3_02cu_93280b36-1dde-43e8-aedc-656a55184cbg.jpg

try using with reshape

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

You can try change output = self.dense(out.view(B, L, -1)) to output = self.dense(out.reshape(B, L, -1))

This comment has been hidden

You can try change output = self.dense(out.view(B, L, -1)) to output = self.dense(out.reshape(B, L, -1))

The reason for using view() is because you are using SDPA's flash-attention?

Sign up or log in to comment