Detecting fraud via statistical anomalies

Urban planners and researchers are increasingly integrating mobility data in designing smarter and sustainable cities. It is therefore crucial to identify any anomalies in the dataset to prevent poor planning or statistical interferences. Such mobility data could come from public sources or data b...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Cara Zheng Yan
Other Authors: Fedor Duzhin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175633
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175633
record_format dspace
spelling sg-ntu-dr.10356-1756332024-05-06T15:36:28Z Detecting fraud via statistical anomalies Lee, Cara Zheng Yan Fedor Duzhin School of Physical and Mathematical Sciences FDuzhin@ntu.edu.sg Mathematical Sciences Statistical anomalies Anomalies Fraud Urban planners and researchers are increasingly integrating mobility data in designing smarter and sustainable cities. It is therefore crucial to identify any anomalies in the dataset to prevent poor planning or statistical interferences. Such mobility data could come from public sources or data brokers, like CityData who offers products for their customers’ economic development. Few studies had detected anomalies in the September 2020 dataset provided by CityData in the context of their research [1], [2] but there is a general lack of studies that focused on analysing those anomalies. Therefore, the purpose of this report is to: find more anomalies not present in previous studies, determine the manipulated ping percentage in each Singapore zone, and then determine if the data was intentionally manipulated. We did these by synthesising statistical techniques proposed by [3] and three other mathematical methods. We found three more anomalies: a circle and line segment, excessive pings, and squares. The number of decimal places (d.p) a ping could have was classified into 16 independent and uniformly distributed bins. We found that our statistical anomalies were the excessive ping anomalies whose d.p do not follow a uniform distribution. Our results indicated that Mandai and Southern Islands produced the highest manipulated percentages while River Valley produced the lowest manipulated percentage. Moreover, Central Area had the largest manipulated percentage SD across all regions. Thus, CityData might had intentionally manipulated the dataset to corroborate the interests of Singapore’s urban planners. Bachelor's degree 2024-05-02T02:30:51Z 2024-05-02T02:30:51Z 2024 Final Year Project (FYP) Lee, C. Z. Y. (2024). Detecting fraud via statistical anomalies. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175633 https://hdl.handle.net/10356/175633 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Mathematical Sciences
Statistical anomalies
Anomalies
Fraud
spellingShingle Mathematical Sciences
Statistical anomalies
Anomalies
Fraud
Lee, Cara Zheng Yan
Detecting fraud via statistical anomalies
description Urban planners and researchers are increasingly integrating mobility data in designing smarter and sustainable cities. It is therefore crucial to identify any anomalies in the dataset to prevent poor planning or statistical interferences. Such mobility data could come from public sources or data brokers, like CityData who offers products for their customers’ economic development. Few studies had detected anomalies in the September 2020 dataset provided by CityData in the context of their research [1], [2] but there is a general lack of studies that focused on analysing those anomalies. Therefore, the purpose of this report is to: find more anomalies not present in previous studies, determine the manipulated ping percentage in each Singapore zone, and then determine if the data was intentionally manipulated. We did these by synthesising statistical techniques proposed by [3] and three other mathematical methods. We found three more anomalies: a circle and line segment, excessive pings, and squares. The number of decimal places (d.p) a ping could have was classified into 16 independent and uniformly distributed bins. We found that our statistical anomalies were the excessive ping anomalies whose d.p do not follow a uniform distribution. Our results indicated that Mandai and Southern Islands produced the highest manipulated percentages while River Valley produced the lowest manipulated percentage. Moreover, Central Area had the largest manipulated percentage SD across all regions. Thus, CityData might had intentionally manipulated the dataset to corroborate the interests of Singapore’s urban planners.
author2 Fedor Duzhin
author_facet Fedor Duzhin
Lee, Cara Zheng Yan
format Final Year Project
author Lee, Cara Zheng Yan
author_sort Lee, Cara Zheng Yan
title Detecting fraud via statistical anomalies
title_short Detecting fraud via statistical anomalies
title_full Detecting fraud via statistical anomalies
title_fullStr Detecting fraud via statistical anomalies
title_full_unstemmed Detecting fraud via statistical anomalies
title_sort detecting fraud via statistical anomalies
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175633
_version_ 1800916121704988672