Papers
arxiv:2401.08449

CLIPRerank: An Extremely Simple Method for Improving Ad-hoc Video Search

Published on Jan 16
Authors:
,
,
,

Abstract

Ad-hoc Video Search (AVS) enables users to search for unlabeled video content using on-the-fly textual queries. Current deep learning-based models for AVS are trained to optimize holistic similarity between short videos and their associated descriptions. However, due to the diversity of ad-hoc queries, even for a short video, its truly relevant part w.r.t. a given query can be of shorter duration. In such a scenario, the holistic similarity becomes suboptimal. To remedy the issue, we propose in this paper CLIPRerank, a fine-grained re-scoring method. We compute cross-modal similarities between query and video frames using a pre-trained CLIP model, with multi-frame scores aggregated by max pooling. The fine-grained score is weightedly added to the initial score for search result reranking. As such, CLIPRerank is agnostic to the underlying video retrieval models and extremely simple, making it a handy plug-in for boosting AVS. Experiments on the challenging TRECVID AVS benchmarks (from 2016 to 2021) justify the effectiveness of the proposed strategy. CLIPRerank consistently improves the TRECVID top performers and multiple existing models including SEA, W2VV++, Dual Encoding, Dual Task, LAFF, CLIP2Video, TS2-Net and X-CLIP. Our method also works when substituting BLIP-2 for CLIP.

Community

this paper discuss about what

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2401.08449 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.08449 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2401.08449 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.