3 years ago

# A Convergence Theory for Deep Learning via Over-Parameterization.

Zeyuan Allen-zhu, Yuanzhi Li, Zhao Song

Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, the neural networks used in practice are going wider and deeper. On the theoretical side, a long line of works have been focusing on why we can train neural networks when there is only one hidden layer. The theory of multi-layer networks remains somewhat unsettled.

In this work, we prove why simple algorithms such as stochastic gradient descent (SGD) can find $\textit{global minima}$ on the training objective of DNNs. We only make two assumptions: the inputs do not degenerate and the network is over-parameterized. The latter means the number of hidden neurons is sufficiently large: $\textit{polynomial}$ in $L$, the number of DNN layers and in $n$, the number of training samples.

As concrete examples, on the training set and starting from randomly initialized weights, we show that SGD attains 100% accuracy in classification tasks, or minimizes regression loss in linear convergence speed $\varepsilon \propto e^{-\Omega(T)}$, with a number of iterations that only scales polynomial in $n$ and $L$. Our theory applies to the widely-used but non-smooth ReLU activation, and to any smooth and possibly non-convex loss functions. In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).

Publisher URL: http://arxiv.org/abs/1811.03962

DOI: arXiv:1811.03962v2

You might also like
Discover & Discuss Important Research

Keeping up-to-date with research can feel impossible, with papers being published faster than you'll ever be able to read them. That's where Researcher comes in: we're simplifying discovery and making important discussions happen. With over 19,000 sources, including peer-reviewed journals, preprints, blogs, universities, podcasts and Live events across 10 research areas, you'll never miss what's important to you. It's like social media, but better. Oh, and we should mention - it's free.

Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.