ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION

Text representation technology is growing with the existence of monolingual and multilingual pre-trained language models. Language models are increasingly being used, especially on text classification problems. One phenomenon that occur in textual data is code-mixing and synonym replacement. With...

Full description

Saved in:
Bibliographic Details
Main Author: Farid Adilazuarda, Muhammad
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/65818
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:65818
spelling id-itb.:658182022-06-25T03:37:10ZADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION Farid Adilazuarda, Muhammad Indonesia Final Project robustness, language model, adversarial attack INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/65818 Text representation technology is growing with the existence of monolingual and multilingual pre-trained language models. Language models are increasingly being used, especially on text classification problems. One phenomenon that occur in textual data is code-mixing and synonym replacement. With the increasingly vital role of language models training, it is necessary to do further evaluation whether the existing pre-trained language models are good enough to handle various cases on textual communication. One technique that can be used is adversarial attack. Adversarial attack (Jin et al., 2020) has the ability to find words that most contribute to the label prediction by a model (vulnerable words). By using adversarial attack technique, these vulnerable words will be translated to simulate the phenomenon of code-mixing and synonym replacement perturbation. The perturbed text will be evaluated with a semantic similarity score to preserve its semantic meaning. Experiments were carried out with two text classification tasks and the results showed that all language models experienced a decrease in performance. In the case of codemixing Indonesian with foreign languages that are not related to Indonesian, the XLM-R model outperforms the IndoBERT model, while in the case of code-mixing languages related to Indonesian, the IndoBERT model outperforms the XLM-R model. Experimental results also show that increasing the size of the model increases the robustness of the model. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Text representation technology is growing with the existence of monolingual and multilingual pre-trained language models. Language models are increasingly being used, especially on text classification problems. One phenomenon that occur in textual data is code-mixing and synonym replacement. With the increasingly vital role of language models training, it is necessary to do further evaluation whether the existing pre-trained language models are good enough to handle various cases on textual communication. One technique that can be used is adversarial attack. Adversarial attack (Jin et al., 2020) has the ability to find words that most contribute to the label prediction by a model (vulnerable words). By using adversarial attack technique, these vulnerable words will be translated to simulate the phenomenon of code-mixing and synonym replacement perturbation. The perturbed text will be evaluated with a semantic similarity score to preserve its semantic meaning. Experiments were carried out with two text classification tasks and the results showed that all language models experienced a decrease in performance. In the case of codemixing Indonesian with foreign languages that are not related to Indonesian, the XLM-R model outperforms the IndoBERT model, while in the case of code-mixing languages related to Indonesian, the IndoBERT model outperforms the XLM-R model. Experimental results also show that increasing the size of the model increases the robustness of the model.
format Final Project
author Farid Adilazuarda, Muhammad
spellingShingle Farid Adilazuarda, Muhammad
ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
author_facet Farid Adilazuarda, Muhammad
author_sort Farid Adilazuarda, Muhammad
title ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_short ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_full ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_fullStr ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_full_unstemmed ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_sort adversarial robustnes testing on pretrained language model in text classification
url https://digilib.itb.ac.id/gdl/view/65818
_version_ 1822932860450897920