Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media

Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook vid...

Full description

Saved in:
Bibliographic Details
Main Authors: SALMINEN, Joni, ALMEREKHI, Hind, MILENKOVIC, Milica, JUNG, Soon-Gyu, KWAK, Haewoon, JANSEN, Bernard J.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5336
https://ink.library.smu.edu.sg/context/sis_research/article/6340/viewcontent/anatomy_of_online.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6340
record_format dspace
spelling sg-smu-ink.sis_research-63402020-10-30T03:19:37Z Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media SALMINEN, Joni ALMEREKHI, Hind MILENKOVIC, Milica JUNG, Soon-Gyu KWAK, Haewoon KWAK, Haewoon JANSEN, Bernard J. Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media. 2018-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5336 https://ink.library.smu.edu.sg/context/sis_research/article/6340/viewcontent/anatomy_of_online.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Online hate toxic comments social media machine learning Databases and Information Systems Social Media
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Online hate
toxic comments
social media
machine learning
Databases and Information Systems
Social Media
spellingShingle Online hate
toxic comments
social media
machine learning
Databases and Information Systems
Social Media
SALMINEN, Joni
ALMEREKHI, Hind
MILENKOVIC, Milica
JUNG, Soon-Gyu
KWAK, Haewoon
KWAK, Haewoon
JANSEN, Bernard J.
Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media
description Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.
format text
author SALMINEN, Joni
ALMEREKHI, Hind
MILENKOVIC, Milica
JUNG, Soon-Gyu
KWAK, Haewoon
KWAK, Haewoon
JANSEN, Bernard J.
author_facet SALMINEN, Joni
ALMEREKHI, Hind
MILENKOVIC, Milica
JUNG, Soon-Gyu
KWAK, Haewoon
KWAK, Haewoon
JANSEN, Bernard J.
author_sort SALMINEN, Joni
title Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media
title_short Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media
title_full Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media
title_fullStr Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media
title_full_unstemmed Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media
title_sort anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/5336
https://ink.library.smu.edu.sg/context/sis_research/article/6340/viewcontent/anatomy_of_online.pdf
_version_ 1770575408008265728