Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference

The recent advances of deep learning in various mobile and Internet-of-Things applications, coupled with the emergence of edge computing, have led to a strong trend of performing deep learning inference on the edge servers located physically close to the end devices. This trend presents the challeng...

Full description

Saved in:

Bibliographic Details
Main Authors:	TAN, Xinrui, LI, Hongjia, XIE, Xiaofei, GUO, Lu, ANSARI, Nirwan, HUANG, Xueqing, WANG, Liming, XU, Zhen, LIU, Yang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Edge computing deep learning inference serving systems efficient deep learning inference reinforcement learning Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/9442 https://ink.library.smu.edu.sg/context/sis_research/article/10442/viewcontent/RL_OnlineRequest_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10442
record_format	dspace
spelling	sg-smu-ink.sis_research-104422024-11-11T08:07:01Z Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference TAN, Xinrui LI, Hongjia XIE, Xiaofei GUO, Lu ANSARI, Nirwan HUANG, Xueqing WANG, Liming XU, Zhen LIU, Yang The recent advances of deep learning in various mobile and Internet-of-Things applications, coupled with the emergence of edge computing, have led to a strong trend of performing deep learning inference on the edge servers located physically close to the end devices. This trend presents the challenge of how to meet the quality-of-service requirements of inference tasks at the resource-constrained network edge, especially under variable or even bursty inference workloads. Solutions to this challenge have not yet been reported in the related literature. In the present paper, we tackle this challenge by means of workload-adaptive inference request scheduling: in different workload states, via adaptive inference request scheduling policies, different models with diverse model sizes can play different roles to maintain high-quality inference services. To implement this idea, we propose a request scheduling framework for general-purpose edge inference serving systems. Theoretically, we prove that, in our framework, the problem of optimizing the inference request scheduling policies can be formulated as a Markov decision process (MDP). To tackle such an MDP, we use reinforcement learning and propose a policy optimization approach. Through extensive experiments, we empirically demonstrate the effectiveness of our framework in the challenging practical case where the MDP is partially observable. 2024-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9442 info:doi/10.1109/TMC.2024.3429571 https://ink.library.smu.edu.sg/context/sis_research/article/10442/viewcontent/RL_OnlineRequest_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Edge computing deep learning inference serving systems efficient deep learning inference reinforcement learning Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Edge computing deep learning inference serving systems efficient deep learning inference reinforcement learning Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
spellingShingle	Edge computing deep learning inference serving systems efficient deep learning inference reinforcement learning Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing TAN, Xinrui LI, Hongjia XIE, Xiaofei GUO, Lu ANSARI, Nirwan HUANG, Xueqing WANG, Liming XU, Zhen LIU, Yang Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference
description	The recent advances of deep learning in various mobile and Internet-of-Things applications, coupled with the emergence of edge computing, have led to a strong trend of performing deep learning inference on the edge servers located physically close to the end devices. This trend presents the challenge of how to meet the quality-of-service requirements of inference tasks at the resource-constrained network edge, especially under variable or even bursty inference workloads. Solutions to this challenge have not yet been reported in the related literature. In the present paper, we tackle this challenge by means of workload-adaptive inference request scheduling: in different workload states, via adaptive inference request scheduling policies, different models with diverse model sizes can play different roles to maintain high-quality inference services. To implement this idea, we propose a request scheduling framework for general-purpose edge inference serving systems. Theoretically, we prove that, in our framework, the problem of optimizing the inference request scheduling policies can be formulated as a Markov decision process (MDP). To tackle such an MDP, we use reinforcement learning and propose a policy optimization approach. Through extensive experiments, we empirically demonstrate the effectiveness of our framework in the challenging practical case where the MDP is partially observable.
format	text
author	TAN, Xinrui LI, Hongjia XIE, Xiaofei GUO, Lu ANSARI, Nirwan HUANG, Xueqing WANG, Liming XU, Zhen LIU, Yang
author_facet	TAN, Xinrui LI, Hongjia XIE, Xiaofei GUO, Lu ANSARI, Nirwan HUANG, Xueqing WANG, Liming XU, Zhen LIU, Yang
author_sort	TAN, Xinrui
title	Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference
title_short	Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference
title_full	Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference
title_fullStr	Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference
title_full_unstemmed	Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference
title_sort	reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9442 https://ink.library.smu.edu.sg/context/sis_research/article/10442/viewcontent/RL_OnlineRequest_av.pdf
_version_	1816859074902360064

Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference

Similar Items