CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION

Customer churn are conditions when customers no longer use the services or products of the company. In the world of telecommunications, customers who will do churn can be seen from the intensity and trend of the use of phone calls. The number of customers churn only about 7.9% compared with active c...

Full description

Saved in:
Bibliographic Details
Main Author: MARGA PRADJA - NIM: 23514078 , ANDRE
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/21063
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:21063
spelling id-itb.:210632017-09-28T15:07:22ZCLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION MARGA PRADJA - NIM: 23514078 , ANDRE Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/21063 Customer churn are conditions when customers no longer use the services or products of the company. In the world of telecommunications, customers who will do churn can be seen from the intensity and trend of the use of phone calls. The number of customers churn only about 7.9% compared with active customers. This distant comparison causes an unbalanced data in which one category dominates the other. This will be a problem in the classification process because the classification results will be more likely to be influenced by the majority category. To solve this problem, we use Synthetic minority over sampling method to balance the data. <br /> <br /> <br /> Other churn prediction problem is the data characteristic which is have high dimension. Not all the dimension has positive effect on classification result. For some cases, classification performance would be better if using selected relevant variables. High dimensional problem can be solved with feature selection. In this study we use random forest with 10-fold cross validation as external resampling and validation. We obtain 90.13% of accuracy, 86.29% of precision ratio, 95.41% of recall ratio and 90.62% of f-score ratio. Our proposed method has good prediction result and can be used to predict potential churn customers. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Customer churn are conditions when customers no longer use the services or products of the company. In the world of telecommunications, customers who will do churn can be seen from the intensity and trend of the use of phone calls. The number of customers churn only about 7.9% compared with active customers. This distant comparison causes an unbalanced data in which one category dominates the other. This will be a problem in the classification process because the classification results will be more likely to be influenced by the majority category. To solve this problem, we use Synthetic minority over sampling method to balance the data. <br /> <br /> <br /> Other churn prediction problem is the data characteristic which is have high dimension. Not all the dimension has positive effect on classification result. For some cases, classification performance would be better if using selected relevant variables. High dimensional problem can be solved with feature selection. In this study we use random forest with 10-fold cross validation as external resampling and validation. We obtain 90.13% of accuracy, 86.29% of precision ratio, 95.41% of recall ratio and 90.62% of f-score ratio. Our proposed method has good prediction result and can be used to predict potential churn customers.
format Theses
author MARGA PRADJA - NIM: 23514078 , ANDRE
spellingShingle MARGA PRADJA - NIM: 23514078 , ANDRE
CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION
author_facet MARGA PRADJA - NIM: 23514078 , ANDRE
author_sort MARGA PRADJA - NIM: 23514078 , ANDRE
title CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION
title_short CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION
title_full CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION
title_fullStr CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION
title_full_unstemmed CLASSIFICATION ON TIME SERIES AND IMBALANCED DATA IN CUSTOMER CHURN PREDICTION
title_sort classification on time series and imbalanced data in customer churn prediction
url https://digilib.itb.ac.id/gdl/view/21063
_version_ 1821120348695822336