HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÏVE BAYES CLASSIFIER

Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics,...

Full description

Saved in:
Bibliographic Details
Main Author: Lorenzo, Feraldo
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/54809
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:54809
spelling id-itb.:548092021-06-05T14:02:49ZHATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÏVE BAYES CLASSIFIER Lorenzo, Feraldo Indonesia Final Project Hate Speech, Binary Positioning Array, Naive Bayes Classifier, Logistic Regression INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/54809 Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics, then classify a given sentence as hate speech or not. In building the model, data is needed for the model to learn, and the data used in this study contains sentences from the Twitter social media platform. These sentences are sentences that have been separated into classes, Hate Speech and Not Hate Speech. Then data pre-processing is carried out to process the data before it is learned by the model. The process that the data goes through includes converting the data to lowercase, removing excess spaces, removing words that don't provide information (subject, preposition, etc.) and converting emojis into words related to that emoji. The model used in this analysis is simple statistical models, Logistic Regression and Naïve Bayes Classifier. In this study, we want to see and compare the performance of the two models when additional features or variables are added to help the model learn the characteristics of hate speech. The variables that are added include the frequency of the word occurrences in the data and variables that record the position of the word in the sentence. The performance of the model is not only measured from validation values such as the AUC Score and F1 Score, but it will also be seen from how the model classifies several samples of new sentences. From the experiment, it was found that in terms of performance, the Logistic Regression Model and the Naïve Bayes Classifier did not have a significant difference. However, the Logistic Regression model has the ability to interpret variables that have the most significant influence on the model's performance through the values of the coefficients within the logit. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics, then classify a given sentence as hate speech or not. In building the model, data is needed for the model to learn, and the data used in this study contains sentences from the Twitter social media platform. These sentences are sentences that have been separated into classes, Hate Speech and Not Hate Speech. Then data pre-processing is carried out to process the data before it is learned by the model. The process that the data goes through includes converting the data to lowercase, removing excess spaces, removing words that don't provide information (subject, preposition, etc.) and converting emojis into words related to that emoji. The model used in this analysis is simple statistical models, Logistic Regression and Naïve Bayes Classifier. In this study, we want to see and compare the performance of the two models when additional features or variables are added to help the model learn the characteristics of hate speech. The variables that are added include the frequency of the word occurrences in the data and variables that record the position of the word in the sentence. The performance of the model is not only measured from validation values such as the AUC Score and F1 Score, but it will also be seen from how the model classifies several samples of new sentences. From the experiment, it was found that in terms of performance, the Logistic Regression Model and the Naïve Bayes Classifier did not have a significant difference. However, the Logistic Regression model has the ability to interpret variables that have the most significant influence on the model's performance through the values of the coefficients within the logit.
format Final Project
author Lorenzo, Feraldo
spellingShingle Lorenzo, Feraldo
HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÏVE BAYES CLASSIFIER
author_facet Lorenzo, Feraldo
author_sort Lorenzo, Feraldo
title HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÏVE BAYES CLASSIFIER
title_short HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÏVE BAYES CLASSIFIER
title_full HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÏVE BAYES CLASSIFIER
title_fullStr HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÏVE BAYES CLASSIFIER
title_full_unstemmed HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÏVE BAYES CLASSIFIER
title_sort hate speech detection with logistic regression and naãve bayes classifier
url https://digilib.itb.ac.id/gdl/view/54809
_version_ 1822929727730483200