PERANCANGAN MODEL PREDIKSI CUSTOMER CHURN MENGGUNAKAN DATA MINING PADA PT X
PT X is a company that offers daily necessities through online, such as fruit, staple foods, vegetables, meat, and others. PT X has B2B (Business to Business) and B2C (Business to Consumer) customers. Business activities carried out for B2B customers have been running steadily and have generated...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76222 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | PT X is a company that offers daily necessities through online, such as fruit, staple foods,
vegetables, meat, and others. PT X has B2B (Business to Business) and B2C (Business to
Consumer) customers. Business activities carried out for B2B customers have been running
steadily and have generated profits. However, business activities carried out for B2C
customers are still unstable in generating profits. One of the factors is the huge cost of
marketing. In addition, competition with other companies which offer services in similar
fields is a challenge for PT X. PT X needs to retain customers who have made transactions
at PT X.
Customer retention at PT X is divided into 2 focuses, namely new users and returning users.
Currently, the average churn rate of new users who joined in July 2021 – May 2022 is 66%
and the average churn rate of returning users who made transactions in July 2021 – May
2022 is 38.16%. The high churn rate occurs because PT X's customer retention strategy is
less effective and PT X does not know which customers should be the focus of the retention
strategy. This research is conducted to create a customer churn prediction model by utilizing
transaction data made by customers.
The methodology used as a reference in this research is the CRISP-DM (Cross-Industry
Standard Process for Data Mining) methodology. In this research, there are four algorithms
used for modeling, specifically, Logistic Regression, Random Forest, Adaptive Boosting,
and Extreme Gradient Boosting. The dataset is prepared using Microsoft Excel and Python.
Based on the conducted modeling, Random Forest is the best model for modeling new users
and returning users. In the new user modeling, Random Forest has an accuracy 74.65%,
precision 75.89%, recall 90.7%, and f1 score 82.63%. In the returning user modeling,
Random Forest has an accuracy 86.7%, precision 89.15%, recall 93.84%, and f1 score
91.44%. Furthermore, the Random Forest model is implemented in web-based application,
Streamlit, using Python.
|
---|