Using Hadoop and Cassandra for taxi data analytics: A feasibility study

This paper reports on a preliminary study to assess the feasibility of using the Open Cirrus Cloud Computing Research testbed to provide offline and online analytical support for taxi fleet operations. In the study, we benchmarked the performance gains from distributing the offline analysis of GPS l...

Full description

Saved in:
Bibliographic Details
Main Authors: KOH, Alvin Jun Yong, NGUYEN, Xuan Khoa, WOODARD, C. Jason
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2010
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7045
https://ink.library.smu.edu.sg/context/sis_research/article/8048/viewcontent/Using_Hadoop_and_Cassandra_for_Taxi_Data_Analytics__A_Feasibility.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8048
record_format dspace
spelling sg-smu-ink.sis_research-80482022-03-29T01:28:31Z Using Hadoop and Cassandra for taxi data analytics: A feasibility study KOH, Alvin Jun Yong NGUYEN, Xuan Khoa WOODARD, C. Jason This paper reports on a preliminary study to assess the feasibility of using the Open Cirrus Cloud Computing Research testbed to provide offline and online analytical support for taxi fleet operations. In the study, we benchmarked the performance gains from distributing the offline analysis of GPS location traces over multiple virtual machines using the Apache Hadoop implementation of the MapReduce paradigm. We also explored the use of the Apache Cassandra distributed database system for online retrieval of vehicle trace data. While configuring the testbed infrastructure was straightforward, we encountered severe I/O bottlenecks in running the benchmarks due to the lack of local disk storage on the compute nodes. This design limitation severely impedes the analysis of large data sets using cloud computing technologies. 2010-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7045 https://ink.library.smu.edu.sg/context/sis_research/article/8048/viewcontent/Using_Hadoop_and_Cassandra_for_Taxi_Data_Analytics__A_Feasibility.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University taxi fleet management GPS data cloud computing Apache Hadoop Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic taxi fleet management
GPS data
cloud computing
Apache Hadoop
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle taxi fleet management
GPS data
cloud computing
Apache Hadoop
Databases and Information Systems
Numerical Analysis and Scientific Computing
KOH, Alvin Jun Yong
NGUYEN, Xuan Khoa
WOODARD, C. Jason
Using Hadoop and Cassandra for taxi data analytics: A feasibility study
description This paper reports on a preliminary study to assess the feasibility of using the Open Cirrus Cloud Computing Research testbed to provide offline and online analytical support for taxi fleet operations. In the study, we benchmarked the performance gains from distributing the offline analysis of GPS location traces over multiple virtual machines using the Apache Hadoop implementation of the MapReduce paradigm. We also explored the use of the Apache Cassandra distributed database system for online retrieval of vehicle trace data. While configuring the testbed infrastructure was straightforward, we encountered severe I/O bottlenecks in running the benchmarks due to the lack of local disk storage on the compute nodes. This design limitation severely impedes the analysis of large data sets using cloud computing technologies.
format text
author KOH, Alvin Jun Yong
NGUYEN, Xuan Khoa
WOODARD, C. Jason
author_facet KOH, Alvin Jun Yong
NGUYEN, Xuan Khoa
WOODARD, C. Jason
author_sort KOH, Alvin Jun Yong
title Using Hadoop and Cassandra for taxi data analytics: A feasibility study
title_short Using Hadoop and Cassandra for taxi data analytics: A feasibility study
title_full Using Hadoop and Cassandra for taxi data analytics: A feasibility study
title_fullStr Using Hadoop and Cassandra for taxi data analytics: A feasibility study
title_full_unstemmed Using Hadoop and Cassandra for taxi data analytics: A feasibility study
title_sort using hadoop and cassandra for taxi data analytics: a feasibility study
publisher Institutional Knowledge at Singapore Management University
publishDate 2010
url https://ink.library.smu.edu.sg/sis_research/7045
https://ink.library.smu.edu.sg/context/sis_research/article/8048/viewcontent/Using_Hadoop_and_Cassandra_for_Taxi_Data_Analytics__A_Feasibility.pdf
_version_ 1770576194198044672