Workflow scheduling
The booming technological industry, currently witnesses a rising competition between telecommunication giants, web service providers and software solution companies etc. All are striving to emerge on the top in terms of how satisfied their customers are and the business impact they are making. They...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/52808 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The booming technological industry, currently witnesses a rising competition between telecommunication giants, web service providers and software solution companies etc. All are striving to emerge on the top in terms of how satisfied their customers are and the business impact they are making. They are deploying huge systems and running large processes to ensure a smooth service for their clients and this in result is generating billions of loosely structured data which the traditional systems and the warehouses cannot sustain any more. Thus, “Big Data” [1] technologies are needed which help in processing such big data several times faster than the traditional methods.
This new class of technology which is being used in the big data analytic environment include a core- open source software framework called Hadoop and MapReduce.
The convergence of big data trend with another technological trend called Cloud Computing [2] emphasizes on the growing need of analysing very large complex data sets. Cloud Computing allows massive amount of computing power to be available as a utility and at a cheap cost. It also offers other benefits such as scalability in real-time and with great ease, high availability and fault tolerant.
As part of her Final Year Project, the author has worked on the basic concepts of Hadoop, implemented a Hadoop Environment, and ran simulations to analyse the time taken by different algorithms and Hadoop schedulers to complete the tasks.
In this report, the author describes the related works and research done with respect to the project and provides a detailed analysis of the collated results obtained from the simulations. The author also provides a brief description of her experience working on the Amazon Elastic Compute Cloud (Amazon EC2), which she sees beneficial for her future work and concludes the report with a brief summary and her key take away from the project. |
---|