arxiv:2308.09247

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

Published on Aug 18, 2023

Authors:

Abstract

We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data. Previous methods commonly conduct representation learning at the clip or frame level and cannot well capture fine-grained semantics. Instead of contrasting the representations of clips or frames, in this paper, we propose a unified self-supervised framework by conducting <PRE_TAG>contrastive learning</POST_TAG> at the point level. Moreover, we introduce a new pretext task by achieving semantic alignment of superpoints, which further facilitates the representations to capture semantic cues at multiple scales. In addition, due to the high redundancy in the temporal dimension of dynamic point clouds, directly conducting <PRE_TAG>contrastive learning</POST_TAG> at the point level usually leads to massive undesired negatives and insufficient modeling of positive representations. To remedy this, we propose a selection strategy to retain proper negatives and make use of high-similarity samples from other instances as positive supplements. Extensive experiments show that our method outperforms supervised counterparts on a wide range of downstream tasks and demonstrates the superior transferability of the learned representations.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2308.09247 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2308.09247 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2308.09247 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.