CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION
Customer churn are conditions when customers no longer use the services or products of the company. In the world of telecommunications, customers who will do churn can be seen from the intensity and trend of the use of phone calls. The number of customers churn only about 7.9% compared with active c...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/21063 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:21063 |
---|---|
spelling |
id-itb.:210632017-09-28T15:07:22ZCLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION MARGA PRADJA - NIM: 23514078 , ANDRE Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/21063 Customer churn are conditions when customers no longer use the services or products of the company. In the world of telecommunications, customers who will do churn can be seen from the intensity and trend of the use of phone calls. The number of customers churn only about 7.9% compared with active customers. This distant comparison causes an unbalanced data in which one category dominates the other. This will be a problem in the classification process because the classification results will be more likely to be influenced by the majority category. To solve this problem, we use Synthetic minority over sampling method to balance the data. <br /> <br /> <br /> Other churn prediction problem is the data characteristic which is have high dimension. Not all the dimension has positive effect on classification result. For some cases, classification performance would be better if using selected relevant variables. High dimensional problem can be solved with feature selection. In this study we use random forest with 10-fold cross validation as external resampling and validation. We obtain 90.13% of accuracy, 86.29% of precision ratio, 95.41% of recall ratio and 90.62% of f-score ratio. Our proposed method has good prediction result and can be used to predict potential churn customers. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Customer churn are conditions when customers no longer use the services or products of the company. In the world of telecommunications, customers who will do churn can be seen from the intensity and trend of the use of phone calls. The number of customers churn only about 7.9% compared with active customers. This distant comparison causes an unbalanced data in which one category dominates the other. This will be a problem in the classification process because the classification results will be more likely to be influenced by the majority category. To solve this problem, we use Synthetic minority over sampling method to balance the data. <br />
<br />
<br />
Other churn prediction problem is the data characteristic which is have high dimension. Not all the dimension has positive effect on classification result. For some cases, classification performance would be better if using selected relevant variables. High dimensional problem can be solved with feature selection. In this study we use random forest with 10-fold cross validation as external resampling and validation. We obtain 90.13% of accuracy, 86.29% of precision ratio, 95.41% of recall ratio and 90.62% of f-score ratio. Our proposed method has good prediction result and can be used to predict potential churn customers. |
format |
Theses |
author |
MARGA PRADJA - NIM: 23514078 , ANDRE |
spellingShingle |
MARGA PRADJA - NIM: 23514078 , ANDRE CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION |
author_facet |
MARGA PRADJA - NIM: 23514078 , ANDRE |
author_sort |
MARGA PRADJA - NIM: 23514078 , ANDRE |
title |
CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION |
title_short |
CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION |
title_full |
CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION |
title_fullStr |
CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION |
title_full_unstemmed |
CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION |
title_sort |
classification on time series and imbalanced data in customer churn prediction |
url |
https://digilib.itb.ac.id/gdl/view/21063 |
_version_ |
1821120348695822336 |