IMPLEMENTATION OF NEURAL NETWORKS FOR BINARY AND MULTICLASS TEXT CLASSIFICATION IN INDONESIAN LANGUAGE
The growing role of social media in the digital era means that the dissemination of information in Indonesia is increasing. One of the most popular social media, Twitter or what is now known as X, is a forum often used by Indonesian people to voice their opinions on various topics, including politic...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/83481 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The growing role of social media in the digital era means that the dissemination of information in Indonesia is increasing. One of the most popular social media, Twitter or what is now known as X, is a forum often used by Indonesian people to voice their opinions on various topics, including politics, economics, social and culture. Apart from that, the problem of spreading fake news is also one of the important problems that this nation is still facing.
The aim of this research is to create an Artificial Neural Network model for binary and multiclass classification of Indonesian language texts. The three neural network architectures that will be created in this research are Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and CNN-LSTM. Binary text classification will use a dataset containing 5000 news stories from trusted sources that have been labeled "Hoax" and "Valid". Meanwhile, multiclass text classification will use a dataset containing 5000 tweets which have been categorized into eight groups.
The data pre-processing steps in this research include text cleaning, tokenization, and word embedding. The resulting model will be trained using the Adam optimization technique with k-fold cross validation. The results of this research show that the CNN-LSTM method is the most optimal method with an accuracy value of 96.10% for binary text classification and 94.44% for multiclass text classification. |
---|