Task allocation and scheduling for distributed job execution
Data analytic jobs usually require large volumes of data inputs that are available at geographically distributed locations. Gathering all the data at a central location for processing would not only place heavy traffic burdens on the underlying networks but also slow down the job execution. To achie...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/160001 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-160001 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Guan, Yitong Task allocation and scheduling for distributed job execution |
description |
Data analytic jobs usually require large volumes of data inputs that are available at geographically distributed locations. Gathering all the data at a central location for processing would not only place heavy traffic burdens on the underlying networks but also slow down the job execution. To achieve better performance, it is often preferable to take advantage of data locality by distributing job execution across multiple sites located close to the data to be processed. Efficiency and fairness are two important considerations for sharing resources among concurrently running distributed jobs. In this thesis, we focus on the efficiency and fairness aspects of resource allocation in distributed job execution and study three specific c problems, including task assignment and scheduling for improving efficiency of distributed job execution, max-min fair resource allocation for distributed job execution, and generalized max-min fair resource allocation for jobs with tasks executable at multiple sites.
First, we study a task assignment and scheduling problem for improving efficiency of distributed job execution in which the data inputs to the jobs may be replicated across multiple locations so that each task of a job can be executed at any one of these locations. To schedule the jobs, we need to determine the processing locations for the tasks of each job and the execution order of the tasks at each location. We focus on the objective of minimizing the average job response time. We first design task assignment algorithms to balance the task allocation among various locations. We then further develop integrated solutions that conduct task assignment and scheduling together. We experimentally evaluate our algorithms using real job traces. The results show that our algorithms can significantly reduce the job response times compared to a baseline that allocates each task to a fixed location for processing.
Second, we study fair resource allocation among jobs requiring distributed execution. We extend conventional max-min fairness for resource allocation in a single machine or machine cluster to distributed job execution over multiple sites and define Aggregate Max-min Fairness (AMF) which requires the aggregate resource allocation across all sites to be max-min fair. We show that AMF satisfies the properties of Pareto efficiency, envy-freeness and strategy-proofness, but it does not necessarily satisfy the sharing incentive property. We propose an enhanced version of AMF to guarantee the sharing incentive property. We present algorithms to compute AMF allocations and propose an add-on to optimize the job response times under AMF. Experimental results show that compared with a baseline which simply requires the resource allocation at each site to be max-min fair, AMF performs significantly better in balancing resource allocation and in average job response time, particularly when the workload distribution of jobs among sites is highly skewed.
When we design the AMF policy, we assume that a job contains a set of tasks and each task can be allocated to only one fixed site for data locality. For reliability or fault tolerance considerations, it is not uncommon for the data to be replicated across several sites. As a result, there are multiple choices for placing tasks that preserve data locality. Finally, we generalize AMF to such scenarios and define Generalized Aggregate Max-min Fairness (GAMF) which also requires the aggregate resource allocation across all sites to be max-min fair. We show that GAMF satisfies the properties of Pareto efficiency, envy- freeness and strategy-proofness, but it does not necessarily satisfy the sharing incentive property either. We further enhance GAMF to guarantee the sharing incentive property. Experimental results show that GAMF outperforms a baseline that conducts max-min fair allocation at each site separately after distributing tasks with the same available sites of data inputs uniformly among these sites. |
author2 |
Tang Xueyan |
author_facet |
Tang Xueyan Guan, Yitong |
format |
Thesis-Doctor of Philosophy |
author |
Guan, Yitong |
author_sort |
Guan, Yitong |
title |
Task allocation and scheduling for distributed job execution |
title_short |
Task allocation and scheduling for distributed job execution |
title_full |
Task allocation and scheduling for distributed job execution |
title_fullStr |
Task allocation and scheduling for distributed job execution |
title_full_unstemmed |
Task allocation and scheduling for distributed job execution |
title_sort |
task allocation and scheduling for distributed job execution |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/160001 |
_version_ |
1743119602148179968 |
spelling |
sg-ntu-dr.10356-1600012022-08-01T05:07:18Z Task allocation and scheduling for distributed job execution Guan, Yitong Tang Xueyan School of Computer Science and Engineering Parallel and Distributed Computing Centre ASXYTang@ntu.edu.sg Engineering::Computer science and engineering Data analytic jobs usually require large volumes of data inputs that are available at geographically distributed locations. Gathering all the data at a central location for processing would not only place heavy traffic burdens on the underlying networks but also slow down the job execution. To achieve better performance, it is often preferable to take advantage of data locality by distributing job execution across multiple sites located close to the data to be processed. Efficiency and fairness are two important considerations for sharing resources among concurrently running distributed jobs. In this thesis, we focus on the efficiency and fairness aspects of resource allocation in distributed job execution and study three specific c problems, including task assignment and scheduling for improving efficiency of distributed job execution, max-min fair resource allocation for distributed job execution, and generalized max-min fair resource allocation for jobs with tasks executable at multiple sites. First, we study a task assignment and scheduling problem for improving efficiency of distributed job execution in which the data inputs to the jobs may be replicated across multiple locations so that each task of a job can be executed at any one of these locations. To schedule the jobs, we need to determine the processing locations for the tasks of each job and the execution order of the tasks at each location. We focus on the objective of minimizing the average job response time. We first design task assignment algorithms to balance the task allocation among various locations. We then further develop integrated solutions that conduct task assignment and scheduling together. We experimentally evaluate our algorithms using real job traces. The results show that our algorithms can significantly reduce the job response times compared to a baseline that allocates each task to a fixed location for processing. Second, we study fair resource allocation among jobs requiring distributed execution. We extend conventional max-min fairness for resource allocation in a single machine or machine cluster to distributed job execution over multiple sites and define Aggregate Max-min Fairness (AMF) which requires the aggregate resource allocation across all sites to be max-min fair. We show that AMF satisfies the properties of Pareto efficiency, envy-freeness and strategy-proofness, but it does not necessarily satisfy the sharing incentive property. We propose an enhanced version of AMF to guarantee the sharing incentive property. We present algorithms to compute AMF allocations and propose an add-on to optimize the job response times under AMF. Experimental results show that compared with a baseline which simply requires the resource allocation at each site to be max-min fair, AMF performs significantly better in balancing resource allocation and in average job response time, particularly when the workload distribution of jobs among sites is highly skewed. When we design the AMF policy, we assume that a job contains a set of tasks and each task can be allocated to only one fixed site for data locality. For reliability or fault tolerance considerations, it is not uncommon for the data to be replicated across several sites. As a result, there are multiple choices for placing tasks that preserve data locality. Finally, we generalize AMF to such scenarios and define Generalized Aggregate Max-min Fairness (GAMF) which also requires the aggregate resource allocation across all sites to be max-min fair. We show that GAMF satisfies the properties of Pareto efficiency, envy- freeness and strategy-proofness, but it does not necessarily satisfy the sharing incentive property either. We further enhance GAMF to guarantee the sharing incentive property. Experimental results show that GAMF outperforms a baseline that conducts max-min fair allocation at each site separately after distributing tasks with the same available sites of data inputs uniformly among these sites. Doctor of Philosophy 2022-07-12T01:15:11Z 2022-07-12T01:15:11Z 2022 Thesis-Doctor of Philosophy Guan, Y. (2022). Task allocation and scheduling for distributed job execution. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/160001 https://hdl.handle.net/10356/160001 10.32657/10356/160001 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |