CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION

Customer churn are conditions when customers no longer use the services or products of the company. In the world of telecommunications, customers who will do churn can be seen from the intensity and trend of the use of phone calls. The number of customers churn only about 7.9% compared with active c...

Full description

Saved in:
Bibliographic Details
Main Author: MARGA PRADJA - NIM: 23514078 , ANDRE
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/21063
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Customer churn are conditions when customers no longer use the services or products of the company. In the world of telecommunications, customers who will do churn can be seen from the intensity and trend of the use of phone calls. The number of customers churn only about 7.9% compared with active customers. This distant comparison causes an unbalanced data in which one category dominates the other. This will be a problem in the classification process because the classification results will be more likely to be influenced by the majority category. To solve this problem, we use Synthetic minority over sampling method to balance the data. <br /> <br /> <br /> Other churn prediction problem is the data characteristic which is have high dimension. Not all the dimension has positive effect on classification result. For some cases, classification performance would be better if using selected relevant variables. High dimensional problem can be solved with feature selection. In this study we use random forest with 10-fold cross validation as external resampling and validation. We obtain 90.13% of accuracy, 86.29% of precision ratio, 95.41% of recall ratio and 90.62% of f-score ratio. Our proposed method has good prediction result and can be used to predict potential churn customers.