CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES

SMS is a text-based communication service without internet connection, provided in most cellular phones worldwide, including Indonesia. SMS is used for multiple purposes, starting from ads, notifications, and daily conversations. The convenience given by SMS also comes with risks, where fraud mes...

Full description

Saved in:
Bibliographic Details
Main Author: Roihan Nafiisah, Hasna
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/68614
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:68614
spelling id-itb.:686142022-09-17T08:18:33ZCONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES Roihan Nafiisah, Hasna Indonesia Final Project SMS, Indonesian language, classification. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/68614 SMS is a text-based communication service without internet connection, provided in most cellular phones worldwide, including Indonesia. SMS is used for multiple purposes, starting from ads, notifications, and daily conversations. The convenience given by SMS also comes with risks, where fraud messages are commonly sent to phones. Some types of SMS texts are sent in large amount, which make it difficult for phone users to access certain type of SMS. In this research, SMS texts are classified into 4 types of contents: ads, information, fraud, and regular. In this research, SMS texts are classified into 4 types of contents: ads, information, fraud, and regular. Shallow learning and deep learning methods are both used to classify text messages, including logistic regression, decision tree, CharCNN (Zhang et al., 2015), McM (Shakeel et al., 2019), and pretrained model IndoBERT (Wilie et al., 2020). Based on observation from experiment, IndoBERT-base-p2 outperformed the others with macro-F1 score 94.05%. In addition to prediction evaluation, storage size and inference time of the best model also analyzed on mobile devices. Model deployment on Android phones shows that storage space for IndoBERT model is 241.34 MB, and average inference time 0.2279 second on Samsung Galaxy A52s 5G and 0.789 second on Vivo Y65. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description SMS is a text-based communication service without internet connection, provided in most cellular phones worldwide, including Indonesia. SMS is used for multiple purposes, starting from ads, notifications, and daily conversations. The convenience given by SMS also comes with risks, where fraud messages are commonly sent to phones. Some types of SMS texts are sent in large amount, which make it difficult for phone users to access certain type of SMS. In this research, SMS texts are classified into 4 types of contents: ads, information, fraud, and regular. In this research, SMS texts are classified into 4 types of contents: ads, information, fraud, and regular. Shallow learning and deep learning methods are both used to classify text messages, including logistic regression, decision tree, CharCNN (Zhang et al., 2015), McM (Shakeel et al., 2019), and pretrained model IndoBERT (Wilie et al., 2020). Based on observation from experiment, IndoBERT-base-p2 outperformed the others with macro-F1 score 94.05%. In addition to prediction evaluation, storage size and inference time of the best model also analyzed on mobile devices. Model deployment on Android phones shows that storage space for IndoBERT model is 241.34 MB, and average inference time 0.2279 second on Samsung Galaxy A52s 5G and 0.789 second on Vivo Y65.
format Final Project
author Roihan Nafiisah, Hasna
spellingShingle Roihan Nafiisah, Hasna
CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES
author_facet Roihan Nafiisah, Hasna
author_sort Roihan Nafiisah, Hasna
title CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES
title_short CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES
title_full CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES
title_fullStr CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES
title_full_unstemmed CONTENT-BASED MULTICLASS CLASSIFICATION ON INDONESIAN SMS MESSAGES
title_sort content-based multiclass classification on indonesian sms messages
url https://digilib.itb.ac.id/gdl/view/68614
_version_ 1822933701953060864