MACHINE LEARNING: KNN AND CLUSTERING IMPLEMENTATION ON FRAUD DETECTION SYSTEM CASE

Financial Technology is rapidly developed and adapted in Industry 4.0 era. This technology enables people to do financial transactions and financial activities easier through several shapes including m-banking, Internet banking, and digital payment. The cause of the massive increase adoption of t...

Full description

Saved in:
Bibliographic Details
Main Author: Naufan Muharam, Athur
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/53772
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Financial Technology is rapidly developed and adapted in Industry 4.0 era. This technology enables people to do financial transactions and financial activities easier through several shapes including m-banking, Internet banking, and digital payment. The cause of the massive increase adoption of this technology can be trace to several opportunities including the massive penetration of handheld devices, in specific smartphone. Thus, with easier access and seamless access of financial transactions, people do more transactions than before that the market transaction volume become much bigger than before. This leads to potential transaction security risk, including fraud. This final paper research focuses on implementing several machine learning combinations to build a fraud detection system to better prevent financial fraud. The machine learning algorithm that are being used and tested are KNN with the combination of clustering (DBSCAN, KMeans, OPTICS). Implementing these algorithms is use CRISP-DM methodology approach. Which includes, (i) defining business needs, (ii) understanding the data, (iii) data preprocessing, (iv) modelling and optimization, (v) and testing. On data processing phase, imbalance datasets are processed using under sampling technique and followed by feature scaling. On modelling and optimization phase, grid search with k-fold cross validation is being use for KNN algorithms and elbow methods is being used for clustering. Testing and evaluation are done using 7 metrics. Which are, false positive rate, area under curve, recall, precision, accuracy, F1 score, and duration. The result of the research shown that when the algorithms implemented on testing data PaySim, the KNN with KMeans algorithm combination give the best recall performance if we compare with other combination. KNN with KMeans have performance with metrics as followed: FPR 0.74%, area under curve 96.64%, recall 88.45%, precision 26.46%, accuracy 99.23%, F1 score 40.73%, and take 17.9 seconds.