MapReduce for data analytics
The author’s final year project is a part of the Green Campus project which aims to conserve energy by using smart technologies. For developing smart technologies that conserve energy, historical data about energy resource usage has to be analysed by building mathematical models to uncover patterns...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/55019 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The author’s final year project is a part of the Green Campus project which aims to conserve energy by using smart technologies. For developing smart technologies that conserve energy, historical data about energy resource usage has to be analysed by building mathematical models to uncover patterns and correlations and be able to predict abnormal energy usage.
The historical data to be analysed can be huge in size if accurate mathematical models need to be built. Processing this huge data set using sequential programming is not possible if the time complexity of the algorithm is not very efficient. Open source frameworks like Hadoop and the MapReduce programming paradigm have made it possible to process huge data sets in parallel on a cluster of machines.
As part of this project the author has designed and implemented a RapidMiner customized operator using MapReduce framework for a Hidden Markov Model based outlier detection of power consumption data. The MapReduce version of the algorithm has then been analysed for accuracy as well as a timing analysis of a dynamic programming implementation of the algorithm vis-à-vis the MapReduce implementation has been done.
The time complexity of the MapReduce version of the model developed by the author, when run on a cluster of 8 machines is linear whereas the time complexity of the dynamic programming implementation of the same model is exponential. The accuracy of the model built by the author is between 80% to 100%. |
---|