Big data analytics for smart transportation

The Singapore urban rail network, interior stations and tracks are highly correlated. If one or some specific stations were disrupted, it would impact the whole network gravely. Therefore, it is pivotal to recognize disruptions happening in these critical stations and put more human and material res...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Judith Yi Ru
Other Authors:	Li Mo
Format:	Final Year Project
Language:	English
Published:	2019
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	http://hdl.handle.net/10356/76997
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-76997
record_format	dspace
spelling	sg-ntu-dr.10356-769972023-03-03T20:58:51Z Big data analytics for smart transportation Tan, Judith Yi Ru Li Mo School of Computer Science and Engineering SMRT Corporation Ltd. DRNTU::Engineering::Computer science and engineering The Singapore urban rail network, interior stations and tracks are highly correlated. If one or some specific stations were disrupted, it would impact the whole network gravely. Therefore, it is pivotal to recognize disruptions happening in these critical stations and put more human and material resources to ensure an efficient and timely “failure response strategy” plan. By using the smart card data provided by Land Transport Authority (LTA) and disruptions events reported in social media, it provides us an opportunity to analyse three key features to find the critical stations that are disrupted. The lack of past information relating to disruptions at certain hours and stations make a detailed analysis of smart card data challenging and near impossible. Therefore, in this final year project, an anomaly detection algorithm is implemented to detect disruptions in the smart card data using two approaches to overcome the shortcomings of anomalies, yet to be discovered. The two approaches adopted are: In-sample approach, which focuses on finding a series of statistical models to detect disruptions(anomalies) in the transit data flow. While, out-of-sample approach is derived to find the best model developed by the in-sample approach as the model of detection for stations without past reported disruption. The out-of-sample approach enables one to know if disruptions could have impacted stations that have never been reported at specific hours. The Gaussian methods(Univariate and Multivariate) will be adopted in this project because it is computationally efficient, combines statistics and supervised machine learning way to solve a problem. After comparison between in-sample and out-of-sample in terms of F1-Score, “duration difference” feature achieves the highest F1-score of 0.56 and 0.38 out of the 3 key features respectively. The other two features are “tap-in” with F1 score of 0.32 and 0.20, and “tap-out”, with F1-Score of 0.38 and 0.22. Feature combinations of “tap-in”,” tap-off” and “duration difference” which can only be built using Multivariate Gaussian method were further experimented and achieved the F1-score of 0.61 and 0.02 respectively. Therefore, the features extracted from the smart card data are the preferred indicators to detect disruptions in smart card data. The poor performance for out-of-sample approach could probably due to the lack of past disruption samples, however both approaches complement each other in detecting disruptions in stations even without past disrupted information. If longer period of historical data (our is of 3months) is invested when building both the in-sample and out-of-sample models, better performance can be achieved. Henceforth, from the two “best” performing indicators, the location of stations, the date and time of disruptions can be accurately identified - which can then enable transit agencies to improve their responses to disruptions, in a more timely, accurate and effective manner. Bachelor of Engineering (Computer Science) 2019-04-30T05:25:06Z 2019-04-30T05:25:06Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/76997 en Nanyang Technological University 34 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Tan, Judith Yi Ru Big data analytics for smart transportation
description	The Singapore urban rail network, interior stations and tracks are highly correlated. If one or some specific stations were disrupted, it would impact the whole network gravely. Therefore, it is pivotal to recognize disruptions happening in these critical stations and put more human and material resources to ensure an efficient and timely “failure response strategy” plan. By using the smart card data provided by Land Transport Authority (LTA) and disruptions events reported in social media, it provides us an opportunity to analyse three key features to find the critical stations that are disrupted. The lack of past information relating to disruptions at certain hours and stations make a detailed analysis of smart card data challenging and near impossible. Therefore, in this final year project, an anomaly detection algorithm is implemented to detect disruptions in the smart card data using two approaches to overcome the shortcomings of anomalies, yet to be discovered. The two approaches adopted are: In-sample approach, which focuses on finding a series of statistical models to detect disruptions(anomalies) in the transit data flow. While, out-of-sample approach is derived to find the best model developed by the in-sample approach as the model of detection for stations without past reported disruption. The out-of-sample approach enables one to know if disruptions could have impacted stations that have never been reported at specific hours. The Gaussian methods(Univariate and Multivariate) will be adopted in this project because it is computationally efficient, combines statistics and supervised machine learning way to solve a problem. After comparison between in-sample and out-of-sample in terms of F1-Score, “duration difference” feature achieves the highest F1-score of 0.56 and 0.38 out of the 3 key features respectively. The other two features are “tap-in” with F1 score of 0.32 and 0.20, and “tap-out”, with F1-Score of 0.38 and 0.22. Feature combinations of “tap-in”,” tap-off” and “duration difference” which can only be built using Multivariate Gaussian method were further experimented and achieved the F1-score of 0.61 and 0.02 respectively. Therefore, the features extracted from the smart card data are the preferred indicators to detect disruptions in smart card data. The poor performance for out-of-sample approach could probably due to the lack of past disruption samples, however both approaches complement each other in detecting disruptions in stations even without past disrupted information. If longer period of historical data (our is of 3months) is invested when building both the in-sample and out-of-sample models, better performance can be achieved. Henceforth, from the two “best” performing indicators, the location of stations, the date and time of disruptions can be accurately identified - which can then enable transit agencies to improve their responses to disruptions, in a more timely, accurate and effective manner.
author2	Li Mo
author_facet	Li Mo Tan, Judith Yi Ru
format	Final Year Project
author	Tan, Judith Yi Ru
author_sort	Tan, Judith Yi Ru
title	Big data analytics for smart transportation
title_short	Big data analytics for smart transportation
title_full	Big data analytics for smart transportation
title_fullStr	Big data analytics for smart transportation
title_full_unstemmed	Big data analytics for smart transportation
title_sort	big data analytics for smart transportation
publishDate	2019
url	http://hdl.handle.net/10356/76997
_version_	1759856342425665536

Big data analytics for smart transportation

Similar Items