ANOMALY DETECTION IN FINANCIAL STATEMENT USING PROBABILISTIC NEURAL NETWORK

<p align="justify">Financial statements are representations to a company’s financial health. New auditors that don’t yet have the same experience as an expert need a tool to help them identify financial statements that might contain anomalies to avoid skipping some fraudulent f...

Full description

Saved in:
Bibliographic Details
Main Author: NADYA SEKARIANI (NIM : 13512017), ARIEZA
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/25761
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:<p align="justify">Financial statements are representations to a company’s financial health. New auditors that don’t yet have the same experience as an expert need a tool to help them identify financial statements that might contain anomalies to avoid skipping some fraudulent financial statements. Fraud triangle is a concept that describe three factors present in a fraudulent act in a financial statement. These three factors are used to detect anomalies in a financial statement. As the three factors are abstract, a set of proxies are used in place of the three factors. The proxies are used as the features in the machine learning process. The variables are extracted from raw financial statement files to train the model. This problem is a binary classification problem and the technique used to solve it is by using Probabilistic Neural Network. Preprocessing is applied to the extracted data by cleaning missing values and normalizing the features. Feature selection is also employed to get the 10 most relevant features. As the original dataset is imbalanced, oversampling using SMOTE is applied to the dataset. The performance of the model is not very good with the recall score is very small. This might be caused by the acquired original dataset which is very varied and small in size thus not a good representation of real situations, PNN is an algorithm that works good with small dataset but with a requirement that the dataset must be representative enough for the performance to be good. As a conclusion the model created in this study is not yet ready to be used in real practice, but acquiring more consistent data may help.<p align="justify">