Task allocation and scheduling for distributed job execution

Data analytic jobs usually require large volumes of data inputs that are available at geographically distributed locations. Gathering all the data at a central location for processing would not only place heavy traffic burdens on the underlying networks but also slow down the job execution. To achie...

Full description

Saved in:
Bibliographic Details
Main Author: Guan, Yitong
Other Authors: Tang Xueyan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/160001
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-160001
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Guan, Yitong
Task allocation and scheduling for distributed job execution
description Data analytic jobs usually require large volumes of data inputs that are available at geographically distributed locations. Gathering all the data at a central location for processing would not only place heavy traffic burdens on the underlying networks but also slow down the job execution. To achieve better performance, it is often preferable to take advantage of data locality by distributing job execution across multiple sites located close to the data to be processed. Efficiency and fairness are two important considerations for sharing resources among concurrently running distributed jobs. In this thesis, we focus on the efficiency and fairness aspects of resource allocation in distributed job execution and study three specific c problems, including task assignment and scheduling for improving efficiency of distributed job execution, max-min fair resource allocation for distributed job execution, and generalized max-min fair resource allocation for jobs with tasks executable at multiple sites. First, we study a task assignment and scheduling problem for improving efficiency of distributed job execution in which the data inputs to the jobs may be replicated across multiple locations so that each task of a job can be executed at any one of these locations. To schedule the jobs, we need to determine the processing locations for the tasks of each job and the execution order of the tasks at each location. We focus on the objective of minimizing the average job response time. We first design task assignment algorithms to balance the task allocation among various locations. We then further develop integrated solutions that conduct task assignment and scheduling together. We experimentally evaluate our algorithms using real job traces. The results show that our algorithms can significantly reduce the job response times compared to a baseline that allocates each task to a fixed location for processing. Second, we study fair resource allocation among jobs requiring distributed execution. We extend conventional max-min fairness for resource allocation in a single machine or machine cluster to distributed job execution over multiple sites and define Aggregate Max-min Fairness (AMF) which requires the aggregate resource allocation across all sites to be max-min fair. We show that AMF satisfies the properties of Pareto efficiency, envy-freeness and strategy-proofness, but it does not necessarily satisfy the sharing incentive property. We propose an enhanced version of AMF to guarantee the sharing incentive property. We present algorithms to compute AMF allocations and propose an add-on to optimize the job response times under AMF. Experimental results show that compared with a baseline which simply requires the resource allocation at each site to be max-min fair, AMF performs significantly better in balancing resource allocation and in average job response time, particularly when the workload distribution of jobs among sites is highly skewed. When we design the AMF policy, we assume that a job contains a set of tasks and each task can be allocated to only one fixed site for data locality. For reliability or fault tolerance considerations, it is not uncommon for the data to be replicated across several sites. As a result, there are multiple choices for placing tasks that preserve data locality. Finally, we generalize AMF to such scenarios and define Generalized Aggregate Max-min Fairness (GAMF) which also requires the aggregate resource allocation across all sites to be max-min fair. We show that GAMF satisfies the properties of Pareto efficiency, envy- freeness and strategy-proofness, but it does not necessarily satisfy the sharing incentive property either. We further enhance GAMF to guarantee the sharing incentive property. Experimental results show that GAMF outperforms a baseline that conducts max-min fair allocation at each site separately after distributing tasks with the same available sites of data inputs uniformly among these sites.
author2 Tang Xueyan
author_facet Tang Xueyan
Guan, Yitong
format Thesis-Doctor of Philosophy
author Guan, Yitong
author_sort Guan, Yitong
title Task allocation and scheduling for distributed job execution
title_short Task allocation and scheduling for distributed job execution
title_full Task allocation and scheduling for distributed job execution
title_fullStr Task allocation and scheduling for distributed job execution
title_full_unstemmed Task allocation and scheduling for distributed job execution
title_sort task allocation and scheduling for distributed job execution
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/160001
_version_ 1743119602148179968
spelling sg-ntu-dr.10356-1600012022-08-01T05:07:18Z Task allocation and scheduling for distributed job execution Guan, Yitong Tang Xueyan School of Computer Science and Engineering Parallel and Distributed Computing Centre ASXYTang@ntu.edu.sg Engineering::Computer science and engineering Data analytic jobs usually require large volumes of data inputs that are available at geographically distributed locations. Gathering all the data at a central location for processing would not only place heavy traffic burdens on the underlying networks but also slow down the job execution. To achieve better performance, it is often preferable to take advantage of data locality by distributing job execution across multiple sites located close to the data to be processed. Efficiency and fairness are two important considerations for sharing resources among concurrently running distributed jobs. In this thesis, we focus on the efficiency and fairness aspects of resource allocation in distributed job execution and study three specific c problems, including task assignment and scheduling for improving efficiency of distributed job execution, max-min fair resource allocation for distributed job execution, and generalized max-min fair resource allocation for jobs with tasks executable at multiple sites. First, we study a task assignment and scheduling problem for improving efficiency of distributed job execution in which the data inputs to the jobs may be replicated across multiple locations so that each task of a job can be executed at any one of these locations. To schedule the jobs, we need to determine the processing locations for the tasks of each job and the execution order of the tasks at each location. We focus on the objective of minimizing the average job response time. We first design task assignment algorithms to balance the task allocation among various locations. We then further develop integrated solutions that conduct task assignment and scheduling together. We experimentally evaluate our algorithms using real job traces. The results show that our algorithms can significantly reduce the job response times compared to a baseline that allocates each task to a fixed location for processing. Second, we study fair resource allocation among jobs requiring distributed execution. We extend conventional max-min fairness for resource allocation in a single machine or machine cluster to distributed job execution over multiple sites and define Aggregate Max-min Fairness (AMF) which requires the aggregate resource allocation across all sites to be max-min fair. We show that AMF satisfies the properties of Pareto efficiency, envy-freeness and strategy-proofness, but it does not necessarily satisfy the sharing incentive property. We propose an enhanced version of AMF to guarantee the sharing incentive property. We present algorithms to compute AMF allocations and propose an add-on to optimize the job response times under AMF. Experimental results show that compared with a baseline which simply requires the resource allocation at each site to be max-min fair, AMF performs significantly better in balancing resource allocation and in average job response time, particularly when the workload distribution of jobs among sites is highly skewed. When we design the AMF policy, we assume that a job contains a set of tasks and each task can be allocated to only one fixed site for data locality. For reliability or fault tolerance considerations, it is not uncommon for the data to be replicated across several sites. As a result, there are multiple choices for placing tasks that preserve data locality. Finally, we generalize AMF to such scenarios and define Generalized Aggregate Max-min Fairness (GAMF) which also requires the aggregate resource allocation across all sites to be max-min fair. We show that GAMF satisfies the properties of Pareto efficiency, envy- freeness and strategy-proofness, but it does not necessarily satisfy the sharing incentive property either. We further enhance GAMF to guarantee the sharing incentive property. Experimental results show that GAMF outperforms a baseline that conducts max-min fair allocation at each site separately after distributing tasks with the same available sites of data inputs uniformly among these sites. Doctor of Philosophy 2022-07-12T01:15:11Z 2022-07-12T01:15:11Z 2022 Thesis-Doctor of Philosophy Guan, Y. (2022). Task allocation and scheduling for distributed job execution. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/160001 https://hdl.handle.net/10356/160001 10.32657/10356/160001 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University