ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION

Text representation technology is growing with the existence of monolingual and multilingual pre-trained language models. Language models are increasingly being used, especially on text classification problems. One phenomenon that occur in textual data is code-mixing and synonym replacement. With...

Full description

Saved in:

Bibliographic Details
Main Author:	Farid Adilazuarda, Muhammad
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/65818
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:65818
spelling	id-itb.:658182022-06-25T03:37:10ZADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION Farid Adilazuarda, Muhammad Indonesia Final Project robustness, language model, adversarial attack INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/65818 Text representation technology is growing with the existence of monolingual and multilingual pre-trained language models. Language models are increasingly being used, especially on text classification problems. One phenomenon that occur in textual data is code-mixing and synonym replacement. With the increasingly vital role of language models training, it is necessary to do further evaluation whether the existing pre-trained language models are good enough to handle various cases on textual communication. One technique that can be used is adversarial attack. Adversarial attack (Jin et al., 2020) has the ability to find words that most contribute to the label prediction by a model (vulnerable words). By using adversarial attack technique, these vulnerable words will be translated to simulate the phenomenon of code-mixing and synonym replacement perturbation. The perturbed text will be evaluated with a semantic similarity score to preserve its semantic meaning. Experiments were carried out with two text classification tasks and the results showed that all language models experienced a decrease in performance. In the case of codemixing Indonesian with foreign languages that are not related to Indonesian, the XLM-R model outperforms the IndoBERT model, while in the case of code-mixing languages related to Indonesian, the IndoBERT model outperforms the XLM-R model. Experimental results also show that increasing the size of the model increases the robustness of the model. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Text representation technology is growing with the existence of monolingual and multilingual pre-trained language models. Language models are increasingly being used, especially on text classification problems. One phenomenon that occur in textual data is code-mixing and synonym replacement. With the increasingly vital role of language models training, it is necessary to do further evaluation whether the existing pre-trained language models are good enough to handle various cases on textual communication. One technique that can be used is adversarial attack. Adversarial attack (Jin et al., 2020) has the ability to find words that most contribute to the label prediction by a model (vulnerable words). By using adversarial attack technique, these vulnerable words will be translated to simulate the phenomenon of code-mixing and synonym replacement perturbation. The perturbed text will be evaluated with a semantic similarity score to preserve its semantic meaning. Experiments were carried out with two text classification tasks and the results showed that all language models experienced a decrease in performance. In the case of codemixing Indonesian with foreign languages that are not related to Indonesian, the XLM-R model outperforms the IndoBERT model, while in the case of code-mixing languages related to Indonesian, the IndoBERT model outperforms the XLM-R model. Experimental results also show that increasing the size of the model increases the robustness of the model.
format	Final Project
author	Farid Adilazuarda, Muhammad
spellingShingle	Farid Adilazuarda, Muhammad ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
author_facet	Farid Adilazuarda, Muhammad
author_sort	Farid Adilazuarda, Muhammad
title	ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_short	ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_full	ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_fullStr	ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_full_unstemmed	ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION
title_sort	adversarial robustnes testing on pretrained language model in text classification
url	https://digilib.itb.ac.id/gdl/view/65818
_version_	1822932860450897920

ADVERSARIAL ROBUSTNES TESTING ON PRETRAINED LANGUAGE MODEL IN TEXT CLASSIFICATION

Similar Items