BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION

Sexism is actions based on the belief that the members of one sex are less intelligent, able, skillful, etc. than the members of the other sex, especially that women are less able than men. In the modern days, sexism is often found in social media because of the lack of consequences given when a...

Full description

Saved in:
Bibliographic Details
Main Author: Tri Rahutami, Gayuh
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/74111
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:74111
spelling id-itb.:741112023-06-26T13:05:14ZBUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION Tri Rahutami, Gayuh Indonesia Final Project sexism, text classification, social media, RoBERTa INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/74111 Sexism is actions based on the belief that the members of one sex are less intelligent, able, skillful, etc. than the members of the other sex, especially that women are less able than men. In the modern days, sexism is often found in social media because of the lack of consequences given when a user performs a sexism act. To go against this trend, an organization called Rewire has conducted a competition in SemEval 2023 titled Toward Explainable Detection of Online Sexism (EDOS), a competition with a goal to create a model that can detect sexism in social media text while also classifying the text to four general categories and eleven specific categories. In this final year project, three artificial neural network models, each for each task specified above, will be created using a transformer-based model, RoBERTa. In the dataset provided, it was also found that there is an imbalance in the data provided, causing the model to unable to predict some of the categories that have less data than the others. To fix this, experiment on data augmentation will also be performed to increase the models’ performance. There will be four data augmentation experiments, without data augmentation, using random oversampling, using easy data augmentation, and using backtranslations. From the experiments, it was found that data augmentation was able to increase the performance of category classification and sub-category classification. In the category classification task, data augmentation was able to increase the F1 score from 0.29 to 0.66. Meanwhile, in the sub-category classification task, data augmentation was able to increase the F1 score from 0.18 to 0.51. From further analysis, it was found that the characteristics of the sexist texts that were successfully predicted were the ones that contain a lot of derogative terms. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Sexism is actions based on the belief that the members of one sex are less intelligent, able, skillful, etc. than the members of the other sex, especially that women are less able than men. In the modern days, sexism is often found in social media because of the lack of consequences given when a user performs a sexism act. To go against this trend, an organization called Rewire has conducted a competition in SemEval 2023 titled Toward Explainable Detection of Online Sexism (EDOS), a competition with a goal to create a model that can detect sexism in social media text while also classifying the text to four general categories and eleven specific categories. In this final year project, three artificial neural network models, each for each task specified above, will be created using a transformer-based model, RoBERTa. In the dataset provided, it was also found that there is an imbalance in the data provided, causing the model to unable to predict some of the categories that have less data than the others. To fix this, experiment on data augmentation will also be performed to increase the models’ performance. There will be four data augmentation experiments, without data augmentation, using random oversampling, using easy data augmentation, and using backtranslations. From the experiments, it was found that data augmentation was able to increase the performance of category classification and sub-category classification. In the category classification task, data augmentation was able to increase the F1 score from 0.29 to 0.66. Meanwhile, in the sub-category classification task, data augmentation was able to increase the F1 score from 0.18 to 0.51. From further analysis, it was found that the characteristics of the sexist texts that were successfully predicted were the ones that contain a lot of derogative terms.
format Final Project
author Tri Rahutami, Gayuh
spellingShingle Tri Rahutami, Gayuh
BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
author_facet Tri Rahutami, Gayuh
author_sort Tri Rahutami, Gayuh
title BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_short BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_full BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_fullStr BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_full_unstemmed BUILDING SEXISM DETECTION AND CLASSIFICATION MODEL FOR SOCIAL MEDIA TEXT USING ROBERTA AND DATA AUGMENTATION
title_sort building sexism detection and classification model for social media text using roberta and data augmentation
url https://digilib.itb.ac.id/gdl/view/74111
_version_ 1822007305327083520