Towards advanced distributed data processing: framework, optimization, and application

The surge in available big data has drawn significant interest in distributed processing methods capable of handling the ever-expanding data volume and increasing computational complexities efficiently and at scale. While existing distributed data processing frameworks, such as Apache Spark, have pr...

Full description

Saved in:

Bibliographic Details
Main Author:	Liu, Kaiqi
Other Authors:	Mo Li
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/177576
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-177576
record_format	dspace
spelling	sg-ntu-dr.10356-1775762024-06-03T06:51:20Z Towards advanced distributed data processing: framework, optimization, and application Liu, Kaiqi Mo Li School of Computer Science and Engineering Alibaba-NTU Singapore Joint Research Institute limo@ntu.edu.sg Computer and Information Science The surge in available big data has drawn significant interest in distributed processing methods capable of handling the ever-expanding data volume and increasing computational complexities efficiently and at scale. While existing distributed data processing frameworks, such as Apache Spark, have proven effective in various applications, there is still considerable room for improvement and exploration in this field. This thesis focuses on three key aspects of advancing distributed data processing using Apache Spark. First, a novel framework is introduced to extend Spark’s capabilities, enabling the efficient processing of large-scale spatio-temporal data to better serve machine-learning applications. This framework not only achieves high efficiency but also provides a user-friendly interface. Second, a deep-learning-based optimization approach tailored to enhance the efficiency of Spark SQL execution is proposed. The end-to-end system integration of this approach leads to practical performance gains. Last, a distributed solution for the computational-intensive large-scale microscopic crowd simulation is designed and implemented aiming to improve the scalability and efficiency of such applications. These three works collectively expand the application of distributed computing and enhance efficiency through the implementation of state-of-the-art techniques. Doctor of Philosophy 2024-05-29T04:45:33Z 2024-05-29T04:45:33Z 2024 Thesis-Doctor of Philosophy Liu, K. (2024). Towards advanced distributed data processing: framework, optimization, and application. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/177576 https://hdl.handle.net/10356/177576 10.32657/10356/177576 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science
spellingShingle	Computer and Information Science Liu, Kaiqi Towards advanced distributed data processing: framework, optimization, and application
description	The surge in available big data has drawn significant interest in distributed processing methods capable of handling the ever-expanding data volume and increasing computational complexities efficiently and at scale. While existing distributed data processing frameworks, such as Apache Spark, have proven effective in various applications, there is still considerable room for improvement and exploration in this field. This thesis focuses on three key aspects of advancing distributed data processing using Apache Spark. First, a novel framework is introduced to extend Spark’s capabilities, enabling the efficient processing of large-scale spatio-temporal data to better serve machine-learning applications. This framework not only achieves high efficiency but also provides a user-friendly interface. Second, a deep-learning-based optimization approach tailored to enhance the efficiency of Spark SQL execution is proposed. The end-to-end system integration of this approach leads to practical performance gains. Last, a distributed solution for the computational-intensive large-scale microscopic crowd simulation is designed and implemented aiming to improve the scalability and efficiency of such applications. These three works collectively expand the application of distributed computing and enhance efficiency through the implementation of state-of-the-art techniques.
author2	Mo Li
author_facet	Mo Li Liu, Kaiqi
format	Thesis-Doctor of Philosophy
author	Liu, Kaiqi
author_sort	Liu, Kaiqi
title	Towards advanced distributed data processing: framework, optimization, and application
title_short	Towards advanced distributed data processing: framework, optimization, and application
title_full	Towards advanced distributed data processing: framework, optimization, and application
title_fullStr	Towards advanced distributed data processing: framework, optimization, and application
title_full_unstemmed	Towards advanced distributed data processing: framework, optimization, and application
title_sort	towards advanced distributed data processing: framework, optimization, and application
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/177576
_version_	1806059925243166720

Towards advanced distributed data processing: framework, optimization, and application

Similar Items