Churn prediction in telecommunication with social network analysis and MapReduce

In this day where telecommunication is getting saturated due to the same pricing model applied by most telcos, it is very easy for customers to leave one telco and join a competitive one. Churn prediction is a data mining technique to predict the probability of a customers wanting to leave the se...

Full description

Saved in:
Bibliographic Details
Main Author: Nguyen, Ngoc Tram Anh
Other Authors: Ng Wee Keong
Format: Final Year Project
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66817
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-66817
record_format dspace
spelling sg-ntu-dr.10356-668172023-03-03T20:40:03Z Churn prediction in telecommunication with social network analysis and MapReduce Nguyen, Ngoc Tram Anh Ng Wee Keong School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence In this day where telecommunication is getting saturated due to the same pricing model applied by most telcos, it is very easy for customers to leave one telco and join a competitive one. Churn prediction is a data mining technique to predict the probability of a customers wanting to leave the service. In this project, churn prediction classifier is implemented with data from an anonymous telecommunication company. The classifier is a binary classification with 2 labels churn and non-churn. We aggregate the data mining features from Call Detail Records (CDR) with basic features such as number of messages in a month, total duration of incoming/outgoing calls in a month, etc. Besides these basic features, graph theory features (Label Propagation and PageRank) are also incorporated in the feature selection method. With the huge amount of data, MapReduce is used to parallelize and partition graph computation such that graph size of 600000 nodes and more can be run comfortably in a personal computer. We achieve commendable results for the classification with all classifiers return around 90% accuracy and more. The classifiers used are Naïve Bayes, Logistic KNN, Logistic Regression, Decision Tree, Random Forest and Bagging. Logistic Regression consistently outperforms other classifiers with the highest result at 96.9% accuracy with AUC score of 0.988. We are confident that the telco will make profit in the long run if they offer these highly accurate potential churners attractive packages to keep them in the service. Bachelor of Engineering (Computer Science) 2016-04-27T02:48:51Z 2016-04-27T02:48:51Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/66817 en Nanyang Technological University 53 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Nguyen, Ngoc Tram Anh
Churn prediction in telecommunication with social network analysis and MapReduce
description In this day where telecommunication is getting saturated due to the same pricing model applied by most telcos, it is very easy for customers to leave one telco and join a competitive one. Churn prediction is a data mining technique to predict the probability of a customers wanting to leave the service. In this project, churn prediction classifier is implemented with data from an anonymous telecommunication company. The classifier is a binary classification with 2 labels churn and non-churn. We aggregate the data mining features from Call Detail Records (CDR) with basic features such as number of messages in a month, total duration of incoming/outgoing calls in a month, etc. Besides these basic features, graph theory features (Label Propagation and PageRank) are also incorporated in the feature selection method. With the huge amount of data, MapReduce is used to parallelize and partition graph computation such that graph size of 600000 nodes and more can be run comfortably in a personal computer. We achieve commendable results for the classification with all classifiers return around 90% accuracy and more. The classifiers used are Naïve Bayes, Logistic KNN, Logistic Regression, Decision Tree, Random Forest and Bagging. Logistic Regression consistently outperforms other classifiers with the highest result at 96.9% accuracy with AUC score of 0.988. We are confident that the telco will make profit in the long run if they offer these highly accurate potential churners attractive packages to keep them in the service.
author2 Ng Wee Keong
author_facet Ng Wee Keong
Nguyen, Ngoc Tram Anh
format Final Year Project
author Nguyen, Ngoc Tram Anh
author_sort Nguyen, Ngoc Tram Anh
title Churn prediction in telecommunication with social network analysis and MapReduce
title_short Churn prediction in telecommunication with social network analysis and MapReduce
title_full Churn prediction in telecommunication with social network analysis and MapReduce
title_fullStr Churn prediction in telecommunication with social network analysis and MapReduce
title_full_unstemmed Churn prediction in telecommunication with social network analysis and MapReduce
title_sort churn prediction in telecommunication with social network analysis and mapreduce
publishDate 2016
url http://hdl.handle.net/10356/66817
_version_ 1759855296843350016