Optimization planning for 3D ConvNets

It is not trivial to optimally learn a 3D Convolutional Neural Networks (3D ConvNets) due to high complexity and various options of the training scheme. The most common hand-tuning process starts from learning 3D ConvNets using short video clips and then is followed by learning long-term temporal de...

Full description

Saved in:

Bibliographic Details
Main Authors:	QIU, Zhaofan, YAO, Ting, NGO, Chong-wah, MEI, Tao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	OS and Networks
Online Access:	https://ink.library.smu.edu.sg/sis_research/6728 https://ink.library.smu.edu.sg/context/sis_research/article/7731/viewcontent/icml21.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7731
record_format	dspace
spelling	sg-smu-ink.sis_research-77312022-01-27T11:10:57Z Optimization planning for 3D ConvNets QIU, Zhaofan YAO, Ting NGO, Chong-wah MEI, Tao It is not trivial to optimally learn a 3D Convolutional Neural Networks (3D ConvNets) due to high complexity and various options of the training scheme. The most common hand-tuning process starts from learning 3D ConvNets using short video clips and then is followed by learning long-term temporal dependency using lengthy clips, while gradually decaying the learning rate from high to low as training progresses. The fact that such process comes along with several heuristic settings motivates the study to seek an optimal "path" to automate the entire training. In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e.g., learning rate and the length of input clips, in each state. The estimation of the knee point on the performance-epoch curve triggers the transition from one state to another. We perform dynamic programming over all the candidate states to plan the optimal permutation of states, i.e., optimization path. Furthermore, we devise a new 3D ConvNets with a unique design of dual-head classifier to improve spatial and temporal discrimination. Extensive experiments on seven public video recognition benchmarks demonstrate the advantages of our proposal. With the optimization planning, our 3D ConvNets achieves superior results when comparing to the state-of-the-art recognition methods. More remarkably, we obtain the top-1 accuracy of 80.5% and 82.7% on Kinetics-400 and Kinetics-600 datasets, respectively. 2021-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6728 https://ink.library.smu.edu.sg/context/sis_research/article/7731/viewcontent/icml21.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University OS and Networks
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	OS and Networks
spellingShingle	OS and Networks QIU, Zhaofan YAO, Ting NGO, Chong-wah MEI, Tao Optimization planning for 3D ConvNets
description	It is not trivial to optimally learn a 3D Convolutional Neural Networks (3D ConvNets) due to high complexity and various options of the training scheme. The most common hand-tuning process starts from learning 3D ConvNets using short video clips and then is followed by learning long-term temporal dependency using lengthy clips, while gradually decaying the learning rate from high to low as training progresses. The fact that such process comes along with several heuristic settings motivates the study to seek an optimal "path" to automate the entire training. In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e.g., learning rate and the length of input clips, in each state. The estimation of the knee point on the performance-epoch curve triggers the transition from one state to another. We perform dynamic programming over all the candidate states to plan the optimal permutation of states, i.e., optimization path. Furthermore, we devise a new 3D ConvNets with a unique design of dual-head classifier to improve spatial and temporal discrimination. Extensive experiments on seven public video recognition benchmarks demonstrate the advantages of our proposal. With the optimization planning, our 3D ConvNets achieves superior results when comparing to the state-of-the-art recognition methods. More remarkably, we obtain the top-1 accuracy of 80.5% and 82.7% on Kinetics-400 and Kinetics-600 datasets, respectively.
format	text
author	QIU, Zhaofan YAO, Ting NGO, Chong-wah MEI, Tao
author_facet	QIU, Zhaofan YAO, Ting NGO, Chong-wah MEI, Tao
author_sort	QIU, Zhaofan
title	Optimization planning for 3D ConvNets
title_short	Optimization planning for 3D ConvNets
title_full	Optimization planning for 3D ConvNets
title_fullStr	Optimization planning for 3D ConvNets
title_full_unstemmed	Optimization planning for 3D ConvNets
title_sort	optimization planning for 3d convnets
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6728 https://ink.library.smu.edu.sg/context/sis_research/article/7731/viewcontent/icml21.pdf
_version_	1770576055365533696

Optimization planning for 3D ConvNets

Similar Items