Papers
arxiv:1612.00593

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Published on Dec 2, 2016
Authors:
,
,
,

Abstract

Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds and well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.

Community

Introduces PointNet: a network to operate on 3D point clouds (while respecting permutation invariance); can be used for downstream 3D classification and segmentation. Treats point clouds as unordered geometric set (unordered, has local structure/interaction, and output should be invariant to Euclidean transformations). Input points go through an input transform (T-Net generates a 3x3 affine transformation matrix - spatial transformer networks - and points are matrix-multiplied); shared MLP (BN+ReLU) converts points to 64-dim, apply a 64d transform (T-Net, matmul with 64x64 matrix), output is learned latent embeddings for the points; project to 1024d (shared MLP) and apply max-pooling, then MLP (with Dropout) for classification; concat 64d (local learned latents) and 1024d (global) for each point (into 1088d) and project to 128d (shared MLP) and then to semantic classes. The affine matrix has a regularisation loss for orthogonal matrices. Contains a theoretical analysis (proof) of approximator for the segmentation network and classification network. Better than 3DShapeNets and VoxNet on ModelNet40 shape classification benchmark; better (mIoU) part segmentation on the ShapeNet dataset (compared to 3D-CNN); also has results on Stanford 3D scene semantic segmentation. Had ablations on order invariant methods (MLP+MaxPool is better than sorting and RNN/LSTM sequential model), affine feature transforms give improvement, robustness test on perturbations, missing data, and outliers. Supplementary material has comparisons with VoxNet, architecture and training details, other applications (model retrieval and shape correspondence), ablations (bottleneck dimension), MNIST digit as 2D pixel set experiment, and more qualitative results. From Stanford (Leonidas J. Guibas).

Links: website, youtube, GitHub (PyTorch)

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1612.00593 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 1