FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data

Social networks have grown into a widespread form of communication that allows a large number of users to participate in conversations and consume information at any time. The casual nature of social media allows for nonstandard terminology, some of which may be considered rude and derogatory. As a...

Full description

Saved in:

Bibliographic Details
Main Author:	Tuarob S.
Other Authors:	Mahidol University
Format:	Article
Published:	2023
Subjects:	Decision Sciences
Online Access:	https://repository.li.mahidol.ac.th/handle/123456789/81322
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Mahidol University

id	th-mahidol.81322
record_format	dspace
spelling	th-mahidol.813222023-05-16T00:22:22Z FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data Tuarob S. Mahidol University Decision Sciences Social networks have grown into a widespread form of communication that allows a large number of users to participate in conversations and consume information at any time. The casual nature of social media allows for nonstandard terminology, some of which may be considered rude and derogatory. As a result, a significant portion of social media users is found to express disrespectful language. This problem may intensify in certain developing countries where young children are granted unsupervised access to social media platforms. Furthermore, the sheer amount of social media data generated daily by millions of users makes it impractical for humans to monitor and regulate inappropriate content. If adolescents are exposed to these harmful language patterns without adequate supervision, they may feel obliged to adopt them. In addition, unrestricted aggression in online forums may result in cyberbullying and other dreadful occurrences. While computational linguistics research has addressed the difficulty of detecting abusive dialogues, issues remain unanswered for low-resource languages with little annotated data, leading the majority of supervised techniques to perform poorly. In addition, social media content is often presented in complex, context-rich formats that encourage creative user involvement. Therefore, we propose to improve the performance of abusive language detection and classification in a low-resource setting, using both the abundant unlabeled data and the context features via the co-training protocol that enables two machine learning models, each learning from an orthogonal set of features, to teach each other, resulting in an overall performance improvement. Empirical results reveal that our proposed framework achieves F1 values of 0.922 and 0.827, surpassing the state-of-the-art baselines by 3.32% and 45.85% for binary and fine-grained classification tasks, respectively. In addition to proving the efficacy of co-training in a low-resource situation for abusive language detection and classification tasks, the findings shed light on several opportunities to use unlabeled data and contextual characteristics of social networks in a variety of social computing applications. 2023-05-15T17:22:22Z 2023-05-15T17:22:22Z 2023-07-01 Article Information Processing and Management Vol.60 No.4 (2023) 10.1016/j.ipm.2023.103381 03064573 2-s2.0-85153572822 https://repository.li.mahidol.ac.th/handle/123456789/81322 SCOPUS
institution	Mahidol University
building	Mahidol University Library
continent	Asia
country	Thailand Thailand
content_provider	Mahidol University Library
collection	Mahidol University Institutional Repository
topic	Decision Sciences
spellingShingle	Decision Sciences Tuarob S. FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data
description	Social networks have grown into a widespread form of communication that allows a large number of users to participate in conversations and consume information at any time. The casual nature of social media allows for nonstandard terminology, some of which may be considered rude and derogatory. As a result, a significant portion of social media users is found to express disrespectful language. This problem may intensify in certain developing countries where young children are granted unsupervised access to social media platforms. Furthermore, the sheer amount of social media data generated daily by millions of users makes it impractical for humans to monitor and regulate inappropriate content. If adolescents are exposed to these harmful language patterns without adequate supervision, they may feel obliged to adopt them. In addition, unrestricted aggression in online forums may result in cyberbullying and other dreadful occurrences. While computational linguistics research has addressed the difficulty of detecting abusive dialogues, issues remain unanswered for low-resource languages with little annotated data, leading the majority of supervised techniques to perform poorly. In addition, social media content is often presented in complex, context-rich formats that encourage creative user involvement. Therefore, we propose to improve the performance of abusive language detection and classification in a low-resource setting, using both the abundant unlabeled data and the context features via the co-training protocol that enables two machine learning models, each learning from an orthogonal set of features, to teach each other, resulting in an overall performance improvement. Empirical results reveal that our proposed framework achieves F1 values of 0.922 and 0.827, surpassing the state-of-the-art baselines by 3.32% and 45.85% for binary and fine-grained classification tasks, respectively. In addition to proving the efficacy of co-training in a low-resource situation for abusive language detection and classification tasks, the findings shed light on several opportunities to use unlabeled data and contextual characteristics of social networks in a variety of social computing applications.
author2	Mahidol University
author_facet	Mahidol University Tuarob S.
format	Article
author	Tuarob S.
author_sort	Tuarob S.
title	FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data
title_short	FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data
title_full	FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data
title_fullStr	FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data
title_full_unstemmed	FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data
title_sort	falcon: detecting and classifying abusive language in social networks using context features and unlabeled data
publishDate	2023
url	https://repository.li.mahidol.ac.th/handle/123456789/81322
_version_	1781414517314420736

FALCoN: Detecting and classifying abusive language in social networks using context features and unlabeled data

Similar Items