High performance data processing systems in cloud

Whenever the term “Big Data” was mentioned, it was often closely associated with technologies like Apache Hadoop and the “NoSQL” class of databases such as MongoDB and Neo4j. It was possible to stream real-time data analytics using these technologies with ease and these analytics usually accomplishe...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Xuan Min
Other Authors:	He Bingsheng
Format:	Final Year Project
Language:	English
Published:	2015
Subjects:	DRNTU::Engineering::Computer science and engineering::Data::Data structures DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Online Access:	http://hdl.handle.net/10356/62851
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-62851
record_format	dspace
spelling	sg-ntu-dr.10356-628512023-03-03T20:24:28Z High performance data processing systems in cloud Tan, Xuan Min He Bingsheng School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Data::Data structures DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Whenever the term “Big Data” was mentioned, it was often closely associated with technologies like Apache Hadoop and the “NoSQL” class of databases such as MongoDB and Neo4j. It was possible to stream real-time data analytics using these technologies with ease and these analytics usually accomplished in 20 minutes or less. Over the past recent years, there were many such open source technologies emerged in the market but how many of them were really efficient and suitable for processing iterative data like graph. Some of the graph processing systems such as GraphLab and Apache Giraph were inspired by Bulk Synchronous Parallel (BSP) model while others like Hadoop follows the Google’s MapReduce framework. In this project, both BSP model and MapReduce framework were intensively studied using two prominent open source projects, Hadoop and Giraph. A series of large graph processing were executed on both systems and their results were analyzed. The experiments show that Giraph is more surpassing in processing iterative data. Bachelor of Engineering (Computer Science) 2015-04-30T02:48:00Z 2015-04-30T02:48:00Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/62851 en Nanyang Technological University 46 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Data::Data structures DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
spellingShingle	DRNTU::Engineering::Computer science and engineering::Data::Data structures DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Tan, Xuan Min High performance data processing systems in cloud
description	Whenever the term “Big Data” was mentioned, it was often closely associated with technologies like Apache Hadoop and the “NoSQL” class of databases such as MongoDB and Neo4j. It was possible to stream real-time data analytics using these technologies with ease and these analytics usually accomplished in 20 minutes or less. Over the past recent years, there were many such open source technologies emerged in the market but how many of them were really efficient and suitable for processing iterative data like graph. Some of the graph processing systems such as GraphLab and Apache Giraph were inspired by Bulk Synchronous Parallel (BSP) model while others like Hadoop follows the Google’s MapReduce framework. In this project, both BSP model and MapReduce framework were intensively studied using two prominent open source projects, Hadoop and Giraph. A series of large graph processing were executed on both systems and their results were analyzed. The experiments show that Giraph is more surpassing in processing iterative data.
author2	He Bingsheng
author_facet	He Bingsheng Tan, Xuan Min
format	Final Year Project
author	Tan, Xuan Min
author_sort	Tan, Xuan Min
title	High performance data processing systems in cloud
title_short	High performance data processing systems in cloud
title_full	High performance data processing systems in cloud
title_fullStr	High performance data processing systems in cloud
title_full_unstemmed	High performance data processing systems in cloud
title_sort	high performance data processing systems in cloud
publishDate	2015
url	http://hdl.handle.net/10356/62851
_version_	1759854948943659008

High performance data processing systems in cloud

Similar Items