Robust PRIDIT scoring method for classification fraud cases in financial data
Increasing number of fraud cases could jeopardize business solvency. Identification of fraud using effective statistical methods, such as classification, can protect organisations from this pitfall. However, identifying fraud cases can be a statistical challenge due to several characteristics of fin...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/101815/1/NorbaitiTukimanPFS2022.pdf http://eprints.utm.my/id/eprint/101815/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:147906 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
Language: | English |
Summary: | Increasing number of fraud cases could jeopardize business solvency. Identification of fraud using effective statistical methods, such as classification, can protect organisations from this pitfall. However, identifying fraud cases can be a statistical challenge due to several characteristics of financial datasets. These data typically form large datasets that are highly dimensional, contain mixed data types and can involve an imbalanced number of fraud and non-fraud cases. This study employed the Principal Component Analysis (PCA) based on Relative to an Identified Distribution (RIDIT) scores, known as the PRIDIT method, to classify and identify data that could potentially be fraudulent cases. The classical PRIDIT method involves the transformation of each analysed dataset into a probability scale, RIDIT score. PCA is then employed to the RIDIT score data matrix to capture the highest variability in the dataset. However, the classical PRIDIT method framework has a limitation in the form of the PCA based Pearson correlation’s measures being insensitive to the variability of the data. In addition, there are no specific measurements for assessing the PRIDIT method’s performance under different data characteristics. Hence, this study proposed a robust PRIDIT methodology framework by incorporating several robust estimators (M-Huber, M-Tukey Bisquare, MM and LTS estimators) to improve the performance of classification tasks in identifying potentially fraudulent case data. The proposed method is applied on a German Credit Card Dataset. The analysis indicates that the highest accuracy rate of 48.5% was obtained by robust PRIDIT based on M-Tukey Bisquare estimator, followed by the results of robust PRIDIT based on MM and LTS estimators, which show similar accuracy scores of 48.1% with classical PRIDIT. The lowest accuracy score was obtained by robust PRIDIT based on M-Huber at 47.9%. A simulation study was also conducted to assess the performance of different PRIDIT methods. Behaviours of different PRIDIT methods were observed under different credibility percentage settings (Non-Fraud (NF); Fraud (F) cases, 95%NF;5%F, 90%NF;10%F, 80%NF;20%F and 70%NF;30%F) and variability levels (low, medium and high) in the datasets. The simulation results show that the accuracy rate obtained by classical PRIDIT, robust PRIDIT based M-Tukey Bisquare, MM, LTS and Huber are 64.3%, 65.3%, 65%, 63.7% and 61.7% respectively at credibility setting (70%NF;30%F) and medium variability. Thus, the findings indicate that the robust PRIDIT based on M-Tukey Bisquare outperform the other estimators by achieving the highest accuracy rate of 65.3%. In addition, the robust PRIDIT method also has a better rate of accuracy when data variability is medium or high compared to the classical PRIDIT method. Thus, this study has introduced a new method using robust PRIDIT to assess the credibility of financial data effectively. |
---|