HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER

Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics,...

Full description

Saved in:

Bibliographic Details
Main Author:	Lorenzo, Feraldo
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/54809
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:54809
spelling	id-itb.:548092021-06-05T14:02:49ZHATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER Lorenzo, Feraldo Indonesia Final Project Hate Speech, Binary Positioning Array, Naive Bayes Classifier, Logistic Regression INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/54809 Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics, then classify a given sentence as hate speech or not. In building the model, data is needed for the model to learn, and the data used in this study contains sentences from the Twitter social media platform. These sentences are sentences that have been separated into classes, Hate Speech and Not Hate Speech. Then data pre-processing is carried out to process the data before it is learned by the model. The process that the data goes through includes converting the data to lowercase, removing excess spaces, removing words that don't provide information (subject, preposition, etc.) and converting emojis into words related to that emoji. The model used in this analysis is simple statistical models, Logistic Regression and Naïve Bayes Classifier. In this study, we want to see and compare the performance of the two models when additional features or variables are added to help the model learn the characteristics of hate speech. The variables that are added include the frequency of the word occurrences in the data and variables that record the position of the word in the sentence. The performance of the model is not only measured from validation values such as the AUC Score and F1 Score, but it will also be seen from how the model classifies several samples of new sentences. From the experiment, it was found that in terms of performance, the Logistic Regression Model and the Naïve Bayes Classifier did not have a significant difference. However, the Logistic Regression model has the ability to interpret variables that have the most significant influence on the model's performance through the values of the coefficients within the logit. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Hate Speech has several key characteristics, such as a specific word that has a hateful sentiment, the order of words in a sentence so that it builds a certain context or the frequency of occurrence of a word. The focus of this research is to build a model that can understand these characteristics, then classify a given sentence as hate speech or not. In building the model, data is needed for the model to learn, and the data used in this study contains sentences from the Twitter social media platform. These sentences are sentences that have been separated into classes, Hate Speech and Not Hate Speech. Then data pre-processing is carried out to process the data before it is learned by the model. The process that the data goes through includes converting the data to lowercase, removing excess spaces, removing words that don't provide information (subject, preposition, etc.) and converting emojis into words related to that emoji. The model used in this analysis is simple statistical models, Logistic Regression and Naïve Bayes Classifier. In this study, we want to see and compare the performance of the two models when additional features or variables are added to help the model learn the characteristics of hate speech. The variables that are added include the frequency of the word occurrences in the data and variables that record the position of the word in the sentence. The performance of the model is not only measured from validation values such as the AUC Score and F1 Score, but it will also be seen from how the model classifies several samples of new sentences. From the experiment, it was found that in terms of performance, the Logistic Regression Model and the Naïve Bayes Classifier did not have a significant difference. However, the Logistic Regression model has the ability to interpret variables that have the most significant influence on the model's performance through the values of the coefficients within the logit.
format	Final Project
author	Lorenzo, Feraldo
spellingShingle	Lorenzo, Feraldo HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER
author_facet	Lorenzo, Feraldo
author_sort	Lorenzo, Feraldo
title	HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER
title_short	HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER
title_full	HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER
title_fullStr	HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER
title_full_unstemmed	HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER
title_sort	hate speech detection with logistic regression and naãve bayes classifier
url	https://digilib.itb.ac.id/gdl/view/54809
_version_	1822929727730483200

HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER

Similar Items

HATE SPEECH DETECTION WITH LOGISTIC REGRESSION AND NAÃVE BAYES CLASSIFIER