CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES

SMS is a text-based communication service without internet connection, provided in most cellular phones worldwide, including Indonesia. SMS is used for multiple purposes, starting from ads, notifications, and daily conversations. The convenience given by SMS also comes with risks, where fraud mes...

Full description

Saved in:
Bibliographic Details
Main Author: Roihan Nafiisah, Hasna
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/68614
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:SMS is a text-based communication service without internet connection, provided in most cellular phones worldwide, including Indonesia. SMS is used for multiple purposes, starting from ads, notifications, and daily conversations. The convenience given by SMS also comes with risks, where fraud messages are commonly sent to phones. Some types of SMS texts are sent in large amount, which make it difficult for phone users to access certain type of SMS. In this research, SMS texts are classified into 4 types of contents: ads, information, fraud, and regular. In this research, SMS texts are classified into 4 types of contents: ads, information, fraud, and regular. Shallow learning and deep learning methods are both used to classify text messages, including logistic regression, decision tree, CharCNN (Zhang et al., 2015), McM (Shakeel et al., 2019), and pretrained model IndoBERT (Wilie et al., 2020). Based on observation from experiment, IndoBERT-base-p2 outperformed the others with macro-F1 score 94.05%. In addition to prediction evaluation, storage size and inference time of the best model also analyzed on mobile devices. Model deployment on Android phones shows that storage space for IndoBERT model is 241.34 MB, and average inference time 0.2279 second on Samsung Galaxy A52s 5G and 0.789 second on Vivo Y65.