Investigating some classification models and applying in bankruptcy prediction

The recent years have seen much discussion of machine intelligence and how this means for human’s health, productivity and wellbeing. In such discussion, machine learning has demonstrated its increasingly important role regards to human’ fundamental need in present and its power of prediction of the...

Full description

Saved in:
Bibliographic Details
Main Author: Tran, Thi Lan Phuong
Other Authors: Tran, Duc Quynh
Format: Final Year Project
Language:English
Published: 2020
Subjects:
Online Access:http://repository.vnu.edu.vn/handle/VNU_123/95352
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Vietnam National University, Hanoi
Language: English
Description
Summary:The recent years have seen much discussion of machine intelligence and how this means for human’s health, productivity and wellbeing. In such discussion, machine learning has demonstrated its increasingly important role regards to human’ fundamental need in present and its power of prediction of the events in the future. Besides, bankruptcy has being a concerned problem due to its negative effects to economy and wellbeing. This problem is out of control. Therefore, research of bankruptcy prediction using machine learning is necessary and practical at the moment. The purpose of the research is to study some classification models and then identify the best predictive model that can be applied to the task of bankruptcy prediction. In this document, the models being studied are decision tree, random forest, bagging and gradient boosting. The idea, architecture, operation and the characteristics of each model are also explored. Furthermore, the Polish companies’ bankruptcy dataset have been chosen to support for the project. It is beginning by analyzing and assessing the dataset quality. Next, the dataset will be preprocessed by using random forest algorithm to impute missing values and Synthetic Minority Oversampling Technique (SMOTE) to balance two target labels in the dataset. Then, models will be applied to the processed dataset to find out the best performance model. Last but not least, K fold cross validation method is also applied to evaluate the model performance. The project uses Python as the programming language, Spyder as a cross-platform integrated development environment and Tableau, Microsoft Excel as visualization tool