CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES
SMS is a text-based communication service without internet connection, provided in most cellular phones worldwide, including Indonesia. SMS is used for multiple purposes, starting from ads, notifications, and daily conversations. The convenience given by SMS also comes with risks, where fraud mes...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/68614 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | SMS is a text-based communication service without internet connection, provided
in most cellular phones worldwide, including Indonesia. SMS is used for multiple
purposes, starting from ads, notifications, and daily conversations.
The convenience given by SMS also comes with risks, where fraud messages are
commonly sent to phones. Some types of SMS texts are sent in large amount, which
make it difficult for phone users to access certain type of SMS. In this research,
SMS texts are classified into 4 types of contents: ads, information, fraud, and
regular.
In this research, SMS texts are classified into 4 types of contents: ads, information,
fraud, and regular. Shallow learning and deep learning methods are both used to
classify text messages, including logistic regression, decision tree, CharCNN
(Zhang et al., 2015), McM (Shakeel et al., 2019), and pretrained model IndoBERT
(Wilie et al., 2020). Based on observation from experiment, IndoBERT-base-p2
outperformed the others with macro-F1 score 94.05%.
In addition to prediction evaluation, storage size and inference time of the best
model also analyzed on mobile devices. Model deployment on Android phones
shows that storage space for IndoBERT model is 241.34 MB, and average inference
time 0.2279 second on Samsung Galaxy A52s 5G and 0.789 second on Vivo Y65. |
---|