Papers
arxiv:2308.10570

Self-Feedback DETR for Temporal Action Detection

Published on Aug 21, 2023
Authors:
,
,

Abstract

Temporal Action Detection (TAD) is challenging but fundamental for real-world video applications. Recently, DETR-based models have been devised for TAD but have not performed well yet. In this paper, we point out the problem in the self-attention of DETR for TAD; the attention modules focus on a few key elements, called <PRE_TAG>temporal collapse problem</POST_TAG>. It degrades the capability of the encoder and decoder since their self-attention modules play no role. To solve the problem, we propose a novel framework, Self-DETR, which utilizes cross-attention maps of the decoder to reactivate self-attention modules. We recover the relationship between encoder features by simple matrix multiplication of the cross-attention map and its transpose. Likewise, we also get the information within decoder queries. By guiding collapsed self-attention maps with the guidance map calculated, we settle down the temporal collapse of self-attention modules in the encoder and decoder. Our extensive experiments demonstrate that Self-DETR resolves the <PRE_TAG>temporal collapse problem</POST_TAG> by keeping high diversity of attention over all layers.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2308.10570 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2308.10570 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2308.10570 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.