MACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE

Financial Technology is rapidly developed and adapted in Industry 4.0 era. This technology enables people to do financial transactions and financial activities easier through several shapes including m-banking, Internet banking, and digital payment. The cause of the massive increase adoption of t...

Full description

Saved in:
Bibliographic Details
Main Author: Naufan Muharam, Athur
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/53772
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:53772
spelling id-itb.:537722021-03-10T09:37:59ZMACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE Naufan Muharam, Athur Indonesia Final Project Machine learning, fraud detection system, fraud, nearest neighbors, KNN, KMeans, DBSCAN, OPTICS, semi-supervised learning INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/53772 Financial Technology is rapidly developed and adapted in Industry 4.0 era. This technology enables people to do financial transactions and financial activities easier through several shapes including m-banking, Internet banking, and digital payment. The cause of the massive increase adoption of this technology can be trace to several opportunities including the massive penetration of handheld devices, in specific smartphone. Thus, with easier access and seamless access of financial transactions, people do more transactions than before that the market transaction volume become much bigger than before. This leads to potential transaction security risk, including fraud. This final paper research focuses on implementing several machine learning combinations to build a fraud detection system to better prevent financial fraud. The machine learning algorithm that are being used and tested are KNN with the combination of clustering (DBSCAN, KMeans, OPTICS). Implementing these algorithms is use CRISP-DM methodology approach. Which includes, (i) defining business needs, (ii) understanding the data, (iii) data preprocessing, (iv) modelling and optimization, (v) and testing. On data processing phase, imbalance datasets are processed using under sampling technique and followed by feature scaling. On modelling and optimization phase, grid search with k-fold cross validation is being use for KNN algorithms and elbow methods is being used for clustering. Testing and evaluation are done using 7 metrics. Which are, false positive rate, area under curve, recall, precision, accuracy, F1 score, and duration. The result of the research shown that when the algorithms implemented on testing data PaySim, the KNN with KMeans algorithm combination give the best recall performance if we compare with other combination. KNN with KMeans have performance with metrics as followed: FPR 0.74%, area under curve 96.64%, recall 88.45%, precision 26.46%, accuracy 99.23%, F1 score 40.73%, and take 17.9 seconds. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Financial Technology is rapidly developed and adapted in Industry 4.0 era. This technology enables people to do financial transactions and financial activities easier through several shapes including m-banking, Internet banking, and digital payment. The cause of the massive increase adoption of this technology can be trace to several opportunities including the massive penetration of handheld devices, in specific smartphone. Thus, with easier access and seamless access of financial transactions, people do more transactions than before that the market transaction volume become much bigger than before. This leads to potential transaction security risk, including fraud. This final paper research focuses on implementing several machine learning combinations to build a fraud detection system to better prevent financial fraud. The machine learning algorithm that are being used and tested are KNN with the combination of clustering (DBSCAN, KMeans, OPTICS). Implementing these algorithms is use CRISP-DM methodology approach. Which includes, (i) defining business needs, (ii) understanding the data, (iii) data preprocessing, (iv) modelling and optimization, (v) and testing. On data processing phase, imbalance datasets are processed using under sampling technique and followed by feature scaling. On modelling and optimization phase, grid search with k-fold cross validation is being use for KNN algorithms and elbow methods is being used for clustering. Testing and evaluation are done using 7 metrics. Which are, false positive rate, area under curve, recall, precision, accuracy, F1 score, and duration. The result of the research shown that when the algorithms implemented on testing data PaySim, the KNN with KMeans algorithm combination give the best recall performance if we compare with other combination. KNN with KMeans have performance with metrics as followed: FPR 0.74%, area under curve 96.64%, recall 88.45%, precision 26.46%, accuracy 99.23%, F1 score 40.73%, and take 17.9 seconds.
format Final Project
author Naufan Muharam, Athur
spellingShingle Naufan Muharam, Athur
MACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE
author_facet Naufan Muharam, Athur
author_sort Naufan Muharam, Athur
title MACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE
title_short MACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE
title_full MACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE
title_fullStr MACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE
title_full_unstemmed MACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE
title_sort machine learning: knn and clustering implementation on fraud detection system case
url https://digilib.itb.ac.id/gdl/view/53772
_version_ 1822929420306874368