Managing data traffic in both intra- and inter- datacenter networks
To support large scale online services, governments and multinational companies such as Google and Microsoft have built a lot of datacenters across the world. As datacenter networks are critical on the performance of those services, both academic and industrial communities have started to explore ho...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/68806 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | To support large scale online services, governments and multinational companies such as Google and Microsoft have built a lot of datacenters across the world. As datacenter networks are critical on the performance of those services, both academic and industrial communities have started to explore how to better design and manage them. Among those proposals, most approaches are designed for intra-datacenter networks to improve the performance of services running in a single datacenter, while another trend of research aims to enhance the performance of services on inter-datacenter networks that connect geo-distributed datacenters. In this thesis, we first propose an efficient network monitoring system for intra-datacenter networks, which can provide valuable information for applications like traffic engineering and anomaly detection inside the datacenter networks. We then take one step further to design a new task scheduling algorithm that improves the performance of big data processing jobs across geographically distributed datacenters on top of inter-datacenter networks. In the first part of the thesis, we innovate in designing a new monitoring framework in intra-datacenter networks to get the traffic matrix, which serves as critical inputs for a variety of applications in datacenter networks. Our preliminary study shows that we cannot estimate the traffic matrix accurately through only Simple Network Management Protocol (SNMP) counters because the number of available measurements (the link counters) is much smaller than the number of variables (the end-to-end paths) in datacenter networks. Thus we creatively take advantage of the operational logs in datacenter networks to provide extra measurements for the traffic estimation problem. Namely, we utilize the resource provisioning information in public datacenter networks and service placement information in private datacenter networks respectively to improve the estimation accuracy. Moreover, we also make use of the lowly utilized links in datacenter networks to obtain a more determined network tomography problem. The extensive results have strongly confirmed the promising performance of our approach. In the second part of the thesis, we seek to improve the performance of geo-distributed big data processing, which has emerged as an important analytical tool for governments and multinational corporations, on top of inter-datacenter networks. The traditional wisdom calls for the collection of all the data across the world to a central datacenter location, to be processed using data-parallel applications. This is neither efficient nor practical as the volume of data grows exponentially. Rather than transferring data, we believe that computation tasks should be scheduled where the data is, while data should be processed with a minimum amount of transfers across datacenters. To this end, we first formulate our problem as an integer linear programming problem (ILP). We then transform it to a linear programming problem (LP) that can be efficiently solved using standard linear programming solvers in an online fashion. To demonstrate the practicality and efficiency of our approach, we also implement it based on Apache Spark, a modern framework popular for big data processing. Our experimental results have shown that we can reduce the job completion time by up to 25%, and the amount of traffic transferred among different datacenters by up to 75%. |
---|