New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum problem optimization. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restric...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHOU, Pan, YUAN, Xiao-Tong, FENG, Jiashi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9007
https://ink.library.smu.edu.sg/context/sis_research/article/10010/viewcontent/NeurIPS_2018_new_insight_into_hybrid_stochastic_gradient_descent_beyond_with_replacement_sampling_and_convexity_Paper.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10010
record_format dspace
spelling sg-smu-ink.sis_research-100102024-07-25T08:15:18Z New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity ZHOU, Pan YUAN, Xiao-Tong FENG, Jiashi As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum problem optimization. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restricted to convex problems. It is not clear whether HSGD still carries these advantages under the common practice of without-replacement sampling (WoRS) for non-convex problems. In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent. For a special class of finite-sum problems with linear prediction models, our convergence results can be further improved in some cases. Extensive numerical results confirm our theoretical affirmation and demonstrate the favorable efficiency of WoRS-based HSGD. 2018-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9007 https://ink.library.smu.edu.sg/context/sis_research/article/10010/viewcontent/NeurIPS_2018_new_insight_into_hybrid_stochastic_gradient_descent_beyond_with_replacement_sampling_and_convexity_Paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University OS and Networks
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic OS and Networks
spellingShingle OS and Networks
ZHOU, Pan
YUAN, Xiao-Tong
FENG, Jiashi
New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
description As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum problem optimization. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restricted to convex problems. It is not clear whether HSGD still carries these advantages under the common practice of without-replacement sampling (WoRS) for non-convex problems. In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent. For a special class of finite-sum problems with linear prediction models, our convergence results can be further improved in some cases. Extensive numerical results confirm our theoretical affirmation and demonstrate the favorable efficiency of WoRS-based HSGD.
format text
author ZHOU, Pan
YUAN, Xiao-Tong
FENG, Jiashi
author_facet ZHOU, Pan
YUAN, Xiao-Tong
FENG, Jiashi
author_sort ZHOU, Pan
title New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
title_short New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
title_full New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
title_fullStr New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
title_full_unstemmed New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
title_sort new insight into hybrid stochastic gradient descent: beyond with-replacement sampling and convexity
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/9007
https://ink.library.smu.edu.sg/context/sis_research/article/10010/viewcontent/NeurIPS_2018_new_insight_into_hybrid_stochastic_gradient_descent_beyond_with_replacement_sampling_and_convexity_Paper.pdf
_version_ 1814047690500079616