matlok 's Collections
LMM

Papers - Attention - Gated Self-Attentio - Spatial Grounding


  • Note Ablation on gated self-attention. As shown in the main paper Figure 3 and Eq 8, our approach uses gated self-attention to absorb the grounding instruction. We can also consider gated cross-attention [1], where the query is the visual feature, and the keys and values are produced using the grounding condition. We ablate this design on COCO2014CD data using LDM... This shows the necessity of information sharing among the visual tokens, which exists in self-attention but not in cross attention.