MapReduce for data analytics

The author’s final year project is a part of the Green Campus project which aims to conserve energy by using smart technologies. For developing smart technologies that conserve energy, historical data about energy resource usage has to be analysed by building mathematical models to uncover patterns...

Full description

Saved in:

Bibliographic Details
Main Author:	Roy Ananya.
Other Authors:	Lee Bu Sung
Format:	Final Year Project
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	http://hdl.handle.net/10356/55019
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-55019
record_format	dspace
spelling	sg-ntu-dr.10356-550192023-03-03T20:45:20Z MapReduce for data analytics Roy Ananya. Lee Bu Sung School of Computer Engineering DRNTU::Engineering::Computer science and engineering The author’s final year project is a part of the Green Campus project which aims to conserve energy by using smart technologies. For developing smart technologies that conserve energy, historical data about energy resource usage has to be analysed by building mathematical models to uncover patterns and correlations and be able to predict abnormal energy usage. The historical data to be analysed can be huge in size if accurate mathematical models need to be built. Processing this huge data set using sequential programming is not possible if the time complexity of the algorithm is not very efficient. Open source frameworks like Hadoop and the MapReduce programming paradigm have made it possible to process huge data sets in parallel on a cluster of machines. As part of this project the author has designed and implemented a RapidMiner customized operator using MapReduce framework for a Hidden Markov Model based outlier detection of power consumption data. The MapReduce version of the algorithm has then been analysed for accuracy as well as a timing analysis of a dynamic programming implementation of the algorithm vis-à-vis the MapReduce implementation has been done. The time complexity of the MapReduce version of the model developed by the author, when run on a cluster of 8 machines is linear whereas the time complexity of the dynamic programming implementation of the same model is exponential. The accuracy of the model built by the author is between 80% to 100%. Bachelor of Engineering (Computer Engineering) 2013-11-29T07:47:18Z 2013-11-29T07:47:18Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/55019 en Nanyang Technological University 68 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Roy Ananya. MapReduce for data analytics
description	The author’s final year project is a part of the Green Campus project which aims to conserve energy by using smart technologies. For developing smart technologies that conserve energy, historical data about energy resource usage has to be analysed by building mathematical models to uncover patterns and correlations and be able to predict abnormal energy usage. The historical data to be analysed can be huge in size if accurate mathematical models need to be built. Processing this huge data set using sequential programming is not possible if the time complexity of the algorithm is not very efficient. Open source frameworks like Hadoop and the MapReduce programming paradigm have made it possible to process huge data sets in parallel on a cluster of machines. As part of this project the author has designed and implemented a RapidMiner customized operator using MapReduce framework for a Hidden Markov Model based outlier detection of power consumption data. The MapReduce version of the algorithm has then been analysed for accuracy as well as a timing analysis of a dynamic programming implementation of the algorithm vis-à-vis the MapReduce implementation has been done. The time complexity of the MapReduce version of the model developed by the author, when run on a cluster of 8 machines is linear whereas the time complexity of the dynamic programming implementation of the same model is exponential. The accuracy of the model built by the author is between 80% to 100%.
author2	Lee Bu Sung
author_facet	Lee Bu Sung Roy Ananya.
format	Final Year Project
author	Roy Ananya.
author_sort	Roy Ananya.
title	MapReduce for data analytics
title_short	MapReduce for data analytics
title_full	MapReduce for data analytics
title_fullStr	MapReduce for data analytics
title_full_unstemmed	MapReduce for data analytics
title_sort	mapreduce for data analytics
publishDate	2013
url	http://hdl.handle.net/10356/55019
_version_	1759854819085910016

MapReduce for data analytics

Similar Items