MapReduce for data analytics

The author’s final year project is a part of the Green Campus project which aims to conserve energy by using smart technologies. For developing smart technologies that conserve energy, historical data about energy resource usage has to be analysed by building mathematical models to uncover patterns...

Full description

Saved in:
Bibliographic Details
Main Author: Roy Ananya.
Other Authors: Lee Bu Sung
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/55019
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-55019
record_format dspace
spelling sg-ntu-dr.10356-550192023-03-03T20:45:20Z MapReduce for data analytics Roy Ananya. Lee Bu Sung School of Computer Engineering DRNTU::Engineering::Computer science and engineering The author’s final year project is a part of the Green Campus project which aims to conserve energy by using smart technologies. For developing smart technologies that conserve energy, historical data about energy resource usage has to be analysed by building mathematical models to uncover patterns and correlations and be able to predict abnormal energy usage. The historical data to be analysed can be huge in size if accurate mathematical models need to be built. Processing this huge data set using sequential programming is not possible if the time complexity of the algorithm is not very efficient. Open source frameworks like Hadoop and the MapReduce programming paradigm have made it possible to process huge data sets in parallel on a cluster of machines. As part of this project the author has designed and implemented a RapidMiner customized operator using MapReduce framework for a Hidden Markov Model based outlier detection of power consumption data. The MapReduce version of the algorithm has then been analysed for accuracy as well as a timing analysis of a dynamic programming implementation of the algorithm vis-à-vis the MapReduce implementation has been done. The time complexity of the MapReduce version of the model developed by the author, when run on a cluster of 8 machines is linear whereas the time complexity of the dynamic programming implementation of the same model is exponential. The accuracy of the model built by the author is between 80% to 100%. Bachelor of Engineering (Computer Engineering) 2013-11-29T07:47:18Z 2013-11-29T07:47:18Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/55019 en Nanyang Technological University 68 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Roy Ananya.
MapReduce for data analytics
description The author’s final year project is a part of the Green Campus project which aims to conserve energy by using smart technologies. For developing smart technologies that conserve energy, historical data about energy resource usage has to be analysed by building mathematical models to uncover patterns and correlations and be able to predict abnormal energy usage. The historical data to be analysed can be huge in size if accurate mathematical models need to be built. Processing this huge data set using sequential programming is not possible if the time complexity of the algorithm is not very efficient. Open source frameworks like Hadoop and the MapReduce programming paradigm have made it possible to process huge data sets in parallel on a cluster of machines. As part of this project the author has designed and implemented a RapidMiner customized operator using MapReduce framework for a Hidden Markov Model based outlier detection of power consumption data. The MapReduce version of the algorithm has then been analysed for accuracy as well as a timing analysis of a dynamic programming implementation of the algorithm vis-à-vis the MapReduce implementation has been done. The time complexity of the MapReduce version of the model developed by the author, when run on a cluster of 8 machines is linear whereas the time complexity of the dynamic programming implementation of the same model is exponential. The accuracy of the model built by the author is between 80% to 100%.
author2 Lee Bu Sung
author_facet Lee Bu Sung
Roy Ananya.
format Final Year Project
author Roy Ananya.
author_sort Roy Ananya.
title MapReduce for data analytics
title_short MapReduce for data analytics
title_full MapReduce for data analytics
title_fullStr MapReduce for data analytics
title_full_unstemmed MapReduce for data analytics
title_sort mapreduce for data analytics
publishDate 2013
url http://hdl.handle.net/10356/55019
_version_ 1759854819085910016