FRAUD DETECTION IN FINANCIAL TRANSACTIONS USING LEAST-SQUARES EXPECTATION AND QUANTILE-BASED PROBABILISTIC SUPPORT VECTOR MACHINE WITH SYMBOLIC DATA
The financial system still plays a very important role in meeting human needs. The use of digital and non-digital financial systems is not without gaps in the security system. One of the biggest threats is fraud in digital transactions which can cause huge losses. Currently, industries have imple...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76541 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The financial system still plays a very important role in meeting human needs. The use
of digital and non-digital financial systems is not without gaps in the security system.
One of the biggest threats is fraud in digital transactions which can cause huge losses.
Currently, industries have implemented fraud detection systems using machine
learning. One of the popularly used machine learning algorithms for classification is
the support vector machine (SVM) because it can work both on structured and
unstructured data, is effective for use on high-dimensional data, and is flexible in kernel
usage. However, SVM is not optimal when applied to large datasets such as fraud
because solving quadratic programming (QP) problems requires large memory and
computation time. Meanwhile, the prediction time of the model is affected by the
complexity of the model which depends on hyperparameters such as the type of kernel
and the size of the data itself. This is important because fraud detection systems rely
heavily on model computational speed to predict fraud transactions in near real-time
and maximize user experience.
To overcome the problem of computational time, there is a least-squares SVM method
which only solves linear equation problems, thereby reducing computational
complexity, especially for large data. In addition, one alternative data reduction
technique is classical data aggregation into symbolic data, such as numerical
histograms and categories. If classical data only stores one data value, symbolic data
will have additional information, for example histogram data can store the range of
values into bins and the frequency of their occurrence. So, it is hoped that this technique
can store as much information as possible with the smallest possible volume which will
reduce memory and computation time. Al-Ma'shumah et al. (2022) modified the
probabilistic support vector machine (PSVM) model by Abaszade et al. (2018) as
expectation-based probabilistic SVM (EPSVM) and quantile-based probabilistic SVM
(QPSVM). This model can be applied to histogram data to produce expectations and
quantile representations.
In this research, a machine learning-based fraud detection system will be created using
the least-squares-based SVM, EPSVM, and QPSVM algorithms. The development is
carried out through the CRISP-DM stages which begin with exploratory data analysis
(EDA) and data preparation which includes random undersampling, feature selection,
and transformation of classical data into histogram data. Then, modeling and
experiments were carried out using hyperparameters such as Nmember, binning method,
kernel, and p-quantile values to be evaluated using several metrics such as recall, FPR,
AUC, as well as computational time consisting of training time and prediction time. As
a result, in the Standard LS-SVM, the highest evaluation metric value is owned by the
RBF kernel with a recall of 0.900, an FPR of 0.012, an AUC of 0.944, a training time
of 5144 seconds, and a prediction time of 7.5 seconds. In the best LS-PSVM model,
choosing the QPSVM algorithm p = 0.5 with a Polynomial kernel, Nmember = 5, and
the binning = Doane method resulted in a recall of 0.860, an FPR of 0.039, an AUC of
0.910, a training time of 405 seconds and a prediction time of 2 seconds. Evaluation
metrics of the best LS-PSVM are still lower than those on the Standard LS-SVM model.
This is because there is missing information from the transformation process from
classical data to symbolic data representation. However, the lower training and
prediction computation time will save computational operational costs, allow for more
frequent model updates and experiments, and speed up real-time detection of fraud
transactions and response times thereby improving the user experience of the financial
system. |
---|