HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER
Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics,...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/54809 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:54809 |
---|---|
spelling |
id-itb.:548092021-06-05T14:02:49ZHATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER Lorenzo, Feraldo Indonesia Final Project Hate Speech, Binary Positioning Array, Naive Bayes Classifier, Logistic Regression INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/54809 Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics, then classify a given sentence as hate speech or not. In building the model, data is needed for the model to learn, and the data used in this study contains sentences from the Twitter social media platform. These sentences are sentences that have been separated into classes, Hate Speech and Not Hate Speech. Then data pre-processing is carried out to process the data before it is learned by the model. The process that the data goes through includes converting the data to lowercase, removing excess spaces, removing words that don't provide information (subject, preposition, etc.) and converting emojis into words related to that emoji. The model used in this analysis is simple statistical models, Logistic Regression and Naïve Bayes Classifier. In this study, we want to see and compare the performance of the two models when additional features or variables are added to help the model learn the characteristics of hate speech. The variables that are added include the frequency of the word occurrences in the data and variables that record the position of the word in the sentence. The performance of the model is not only measured from validation values such as the AUC Score and F1 Score, but it will also be seen from how the model classifies several samples of new sentences. From the experiment, it was found that in terms of performance, the Logistic Regression Model and the Naïve Bayes Classifier did not have a significant difference. However, the Logistic Regression model has the ability to interpret variables that have the most significant influence on the model's performance through the values of the coefficients within the logit. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics, then classify a given sentence as hate speech or not.
In building the model, data is needed for the model to learn, and the data used in this study contains sentences from the Twitter social media platform. These sentences are sentences that have been separated into classes, Hate Speech and Not Hate Speech. Then data pre-processing is carried out to process the data before it is learned by the model. The process that the data goes through includes converting the data to lowercase, removing excess spaces, removing words that don't provide information (subject, preposition, etc.) and converting emojis into words related to that emoji.
The model used in this analysis is simple statistical models, Logistic Regression and Naïve Bayes Classifier. In this study, we want to see and compare the performance of the two models when additional features or variables are added to help the model learn the characteristics of hate speech. The variables that are added include the frequency of the word occurrences in the data and variables that record the position of the word in the sentence. The performance of the model is
not only measured from validation values such as the AUC Score and F1 Score, but it will also be seen from how the model classifies several samples of new sentences.
From the experiment, it was found that in terms of performance, the Logistic Regression Model and the Naïve Bayes Classifier did not have a significant difference. However, the Logistic Regression model has the ability to interpret variables that have the most significant influence on the model's performance through the values of the coefficients within the logit. |
format |
Final Project |
author |
Lorenzo, Feraldo |
spellingShingle |
Lorenzo, Feraldo HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER |
author_facet |
Lorenzo, Feraldo |
author_sort |
Lorenzo, Feraldo |
title |
HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER |
title_short |
HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER |
title_full |
HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER |
title_fullStr |
HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER |
title_full_unstemmed |
HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER |
title_sort |
hate speech detection with logistic regression and naãve bayes classifier |
url |
https://digilib.itb.ac.id/gdl/view/54809 |
_version_ |
1822929727730483200 |