Extracting attention maps

#49

by roeehendel - opened Jun 8, 2023

Discussion

roeehendel

Jun 8, 2023

•

edited Jun 12, 2023

It seems that since the model is using scaled_dot_product_attention, passing output_attentions=True to the forward is not supported.
Also, attention masking using attention_mask is not supported (this fails silently, there is no assertion to warn the user).
Is there a workaround to enable using these features?
Perhaps there should be an option to use a regular implementation of attention instead of scaled_dot_product_attention.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment