Understanding generalization and optimization performance of deep CNNs

This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consistin...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHOU, Pan, FENG, Jiashi
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/9010 https://ink.library.smu.edu.sg/context/sis_research/article/10013/viewcontent/2018_ICML_deepCNNs.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10013
record_format	dspace
spelling	sg-smu-ink.sis_research-100132024-07-25T08:13:48Z Understanding generalization and optimization performance of deep CNNs ZHOU, Pan FENG, Jiashi This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of l convolutional layers and one fully connected layer, we prove that its generalization error is bounded by O( p θ%/n e ) where θ denotes freedom degree of the network parameters and %e = O(log(Ql i=1 bi(ki − si + 1)/p) + log(bl+1)) encapsulates architecture parameters including the kernel size ki , stride si , pooling size p and parameter magnitude bi . To our best knowledge, this is the first generalization bound that only depends on O(log(Ql+1 i=1 bi)), tighter than existing ones that all involve an exponential term like O( Ql+1 i=1 bi). Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring the good generalization performance of CNNs. 2018-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9010 https://ink.library.smu.edu.sg/context/sis_research/article/10013/viewcontent/2018_ICML_deepCNNs.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Theory and Algorithms
spellingShingle	Theory and Algorithms ZHOU, Pan FENG, Jiashi Understanding generalization and optimization performance of deep CNNs
description	This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of l convolutional layers and one fully connected layer, we prove that its generalization error is bounded by O( p θ%/n e ) where θ denotes freedom degree of the network parameters and %e = O(log(Ql i=1 bi(ki − si + 1)/p) + log(bl+1)) encapsulates architecture parameters including the kernel size ki , stride si , pooling size p and parameter magnitude bi . To our best knowledge, this is the first generalization bound that only depends on O(log(Ql+1 i=1 bi)), tighter than existing ones that all involve an exponential term like O( Ql+1 i=1 bi). Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring the good generalization performance of CNNs.
format	text
author	ZHOU, Pan FENG, Jiashi
author_facet	ZHOU, Pan FENG, Jiashi
author_sort	ZHOU, Pan
title	Understanding generalization and optimization performance of deep CNNs
title_short	Understanding generalization and optimization performance of deep CNNs
title_full	Understanding generalization and optimization performance of deep CNNs
title_fullStr	Understanding generalization and optimization performance of deep CNNs
title_full_unstemmed	Understanding generalization and optimization performance of deep CNNs
title_sort	understanding generalization and optimization performance of deep cnns
publisher	Institutional Knowledge at Singapore Management University
publishDate	2018
url	https://ink.library.smu.edu.sg/sis_research/9010 https://ink.library.smu.edu.sg/context/sis_research/article/10013/viewcontent/2018_ICML_deepCNNs.pdf
_version_	1814047691521392640

Understanding generalization and optimization performance of deep CNNs

Similar Items