Papers
arxiv:2311.13774

Learning Hierarchical Polynomials with Three-Layer Neural Networks

Published on Nov 23, 2023
Authors:
,
,

Abstract

We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form h = g circ p where p : R^d rightarrow R is a degree k polynomial and g: R rightarrow R is a degree q polynomial. This function class generalizes the single-index model, which corresponds to k=1, and is a natural class of functions possessing an underlying hierarchical structure. Our main result shows that for a large subclass of degree k polynomials p, a three-layer neural network trained via layerwise gradient descent on the square loss learns the target h up to vanishing test error in mathcal{O}(d^k) samples and polynomial time. This is a strict improvement over kernel methods, which require widetilde Theta(d^{kq}) samples, as well as existing guarantees for two-layer networks, which require the target function to be low-rank. Our result also generalizes prior works on three-layer neural networks, which were restricted to the case of p being a quadratic. When p is indeed a quadratic, we achieve the information-theoretically optimal sample complexity mathcal{O}(d^2), which is an improvement over prior work~nichani2023provable requiring a sample size of widetildeTheta(d^4). Our proof proceeds by showing that during the initial stage of training the network performs feature learning to recover the feature p with mathcal{O}(d^k) samples. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2311.13774 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2311.13774 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2311.13774 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.