Understanding generalization and optimization performance of deep CNNs

This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consistin...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHOU, Pan, FENG, Jiashi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9010
https://ink.library.smu.edu.sg/context/sis_research/article/10013/viewcontent/2018_ICML_deepCNNs.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10013
record_format dspace
spelling sg-smu-ink.sis_research-100132024-07-25T08:13:48Z Understanding generalization and optimization performance of deep CNNs ZHOU, Pan FENG, Jiashi This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of l convolutional layers and one fully connected layer, we prove that its generalization error is bounded by O( p θ%/n e ) where θ denotes freedom degree of the network parameters and %e = O(log(Ql i=1 bi(ki − si + 1)/p) + log(bl+1)) encapsulates architecture parameters including the kernel size ki , stride si , pooling size p and parameter magnitude bi . To our best knowledge, this is the first generalization bound that only depends on O(log(Ql+1 i=1 bi)), tighter than existing ones that all involve an exponential term like O( Ql+1 i=1 bi). Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring the good generalization performance of CNNs. 2018-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9010 https://ink.library.smu.edu.sg/context/sis_research/article/10013/viewcontent/2018_ICML_deepCNNs.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Theory and Algorithms
spellingShingle Theory and Algorithms
ZHOU, Pan
FENG, Jiashi
Understanding generalization and optimization performance of deep CNNs
description This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of l convolutional layers and one fully connected layer, we prove that its generalization error is bounded by O( p θ%/n e ) where θ denotes freedom degree of the network parameters and %e = O(log(Ql i=1 bi(ki − si + 1)/p) + log(bl+1)) encapsulates architecture parameters including the kernel size ki , stride si , pooling size p and parameter magnitude bi . To our best knowledge, this is the first generalization bound that only depends on O(log(Ql+1 i=1 bi)), tighter than existing ones that all involve an exponential term like O( Ql+1 i=1 bi). Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring the good generalization performance of CNNs.
format text
author ZHOU, Pan
FENG, Jiashi
author_facet ZHOU, Pan
FENG, Jiashi
author_sort ZHOU, Pan
title Understanding generalization and optimization performance of deep CNNs
title_short Understanding generalization and optimization performance of deep CNNs
title_full Understanding generalization and optimization performance of deep CNNs
title_fullStr Understanding generalization and optimization performance of deep CNNs
title_full_unstemmed Understanding generalization and optimization performance of deep CNNs
title_sort understanding generalization and optimization performance of deep cnns
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/9010
https://ink.library.smu.edu.sg/context/sis_research/article/10013/viewcontent/2018_ICML_deepCNNs.pdf
_version_ 1814047691521392640