BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION

Sexism is actions based on the belief that the members of one sex are less intelligent, able, skillful, etc. than the members of the other sex, especially that women are less able than men. In the modern days, sexism is often found in social media because of the lack of consequences given when a...

Full description

Saved in:

Bibliographic Details
Main Author:	Tri Rahutami, Gayuh
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/74111
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:74111
spelling	id-itb.:741112023-06-26T13:05:14ZBUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION Tri Rahutami, Gayuh Indonesia Final Project sexism, text classification, social media, RoBERTa INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/74111 Sexism is actions based on the belief that the members of one sex are less intelligent, able, skillful, etc. than the members of the other sex, especially that women are less able than men. In the modern days, sexism is often found in social media because of the lack of consequences given when a user performs a sexism act. To go against this trend, an organization called Rewire has conducted a competition in SemEval 2023 titled Toward Explainable Detection of Online Sexism (EDOS), a competition with a goal to create a model that can detect sexism in social media text while also classifying the text to four general categories and eleven specific categories. In this final year project, three artificial neural network models, each for each task specified above, will be created using a transformer-based model, RoBERTa. In the dataset provided, it was also found that there is an imbalance in the data provided, causing the model to unable to predict some of the categories that have less data than the others. To fix this, experiment on data augmentation will also be performed to increase the models’ performance. There will be four data augmentation experiments, without data augmentation, using random oversampling, using easy data augmentation, and using backtranslations. From the experiments, it was found that data augmentation was able to increase the performance of category classification and sub-category classification. In the category classification task, data augmentation was able to increase the F1 score from 0.29 to 0.66. Meanwhile, in the sub-category classification task, data augmentation was able to increase the F1 score from 0.18 to 0.51. From further analysis, it was found that the characteristics of the sexist texts that were successfully predicted were the ones that contain a lot of derogative terms. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Sexism is actions based on the belief that the members of one sex are less intelligent, able, skillful, etc. than the members of the other sex, especially that women are less able than men. In the modern days, sexism is often found in social media because of the lack of consequences given when a user performs a sexism act. To go against this trend, an organization called Rewire has conducted a competition in SemEval 2023 titled Toward Explainable Detection of Online Sexism (EDOS), a competition with a goal to create a model that can detect sexism in social media text while also classifying the text to four general categories and eleven specific categories. In this final year project, three artificial neural network models, each for each task specified above, will be created using a transformer-based model, RoBERTa. In the dataset provided, it was also found that there is an imbalance in the data provided, causing the model to unable to predict some of the categories that have less data than the others. To fix this, experiment on data augmentation will also be performed to increase the models’ performance. There will be four data augmentation experiments, without data augmentation, using random oversampling, using easy data augmentation, and using backtranslations. From the experiments, it was found that data augmentation was able to increase the performance of category classification and sub-category classification. In the category classification task, data augmentation was able to increase the F1 score from 0.29 to 0.66. Meanwhile, in the sub-category classification task, data augmentation was able to increase the F1 score from 0.18 to 0.51. From further analysis, it was found that the characteristics of the sexist texts that were successfully predicted were the ones that contain a lot of derogative terms.
format	Final Project
author	Tri Rahutami, Gayuh
spellingShingle	Tri Rahutami, Gayuh BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
author_facet	Tri Rahutami, Gayuh
author_sort	Tri Rahutami, Gayuh
title	BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_short	BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_full	BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_fullStr	BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_full_unstemmed	BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_sort	building sexism detection and classification model for social media text using roberta and data augmentation
url	https://digilib.itb.ac.id/gdl/view/74111
_version_	1822007305327083520

BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION

Similar Items