arxiv:2301.02364

Object as Query: Lifting any 2D Object Detector to 3D Detection

Published on Jan 6, 2023

Authors:

Abstract

<PRE_TAG>3D object detection</POST_TAG> from multi-view images has drawn much attention over the past few years. Existing methods mainly establish 3D representations from multi-view images and adopt a dense detection head for object detection, or employ object queries distributed in 3D space to localize objects. In this paper, we design Multi-View 2D Objects guided 3D Object Detector (MV2D), which can lift any 2D object detector to multi-view <PRE_TAG>3D object detection</POST_TAG>. Since 2D detections can provide valuable priors for object existence, MV2D exploits 2D detectors to generate object queries conditioned on the rich image semantics. These dynamically generated queries help MV2D to recall objects in the field of view and show a strong capability of localizing 3D objects. For the generated queries, we design a sparse cross attention module to force them to focus on the features of specific objects, which suppresses interference from noises. The evaluation results on the nuScenes dataset demonstrate the dynamic object queries and sparse feature aggregation can promote 3D detection capability. MV2D also exhibits a state-of-the-art performance among existing methods. We hope MV2D can serve as a new baseline for future research.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2301.02364 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2301.02364 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2301.02364 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.