Prompt sensitivity of transformer variants for text classification

This study investigates the sensitivity of various Transformer model architectures, encoder-only (BERT), decoder-only (GPT-2), and encoder-decoder (T5), in response to various types of prompt modifications on text classification tasks. By leveraging a fine-tuning approach, the models were evaluated...

Full description

Saved in:

Bibliographic Details
Main Author:	Ong, Li Han
Other Authors:	Wang Wenya
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Machine learning Classification Model sensitivity Large language models
Online Access:	https://hdl.handle.net/10356/181519
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-181519
record_format	dspace
spelling	sg-ntu-dr.10356-1815192024-12-09T11:30:06Z Prompt sensitivity of transformer variants for text classification Ong, Li Han Wang Wenya College of Computing and Data Science wangwy@ntu.edu.sg Computer and Information Science Machine learning Classification Model sensitivity Large language models This study investigates the sensitivity of various Transformer model architectures, encoder-only (BERT), decoder-only (GPT-2), and encoder-decoder (T5), in response to various types of prompt modifications on text classification tasks. By leveraging a fine-tuning approach, the models were evaluated across chosen benchmark datasets from GLUE, with modifications encompassing lexical, positioning, and syntactic changes. The findings reveal that encoder-based models (BERT and T5) demonstrate greater sensitivity to prompt modifications than the decoder-only model (GPT-2), with varying impacts based on task and modification type. We reason that the complete bidirectional nature of the encoder self-attention mechanism causes models to overfit on subtle linguistic artifacts in the training data, reducing the ability to generalise to unseen examples. As such, we recommend that models used in production that deal with potentially unpredictable input (ie. client-facing applications), be trained on more diverse data to enhance model robustness. This can be obtained through manual collection or noise-based data augmentation such as the prompt modification techniques covered in this study. Future research is recommended to explore additional modification categories, tasks, and scalability effects across larger models. Bachelor's degree 2024-12-09T11:30:06Z 2024-12-09T11:30:06Z 2024 Final Year Project (FYP) Ong, L. H. (2024). Prompt sensitivity of transformer variants for text classification. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181519 https://hdl.handle.net/10356/181519 en SCSE23-1030 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Machine learning Classification Model sensitivity Large language models
spellingShingle	Computer and Information Science Machine learning Classification Model sensitivity Large language models Ong, Li Han Prompt sensitivity of transformer variants for text classification
description	This study investigates the sensitivity of various Transformer model architectures, encoder-only (BERT), decoder-only (GPT-2), and encoder-decoder (T5), in response to various types of prompt modifications on text classification tasks. By leveraging a fine-tuning approach, the models were evaluated across chosen benchmark datasets from GLUE, with modifications encompassing lexical, positioning, and syntactic changes. The findings reveal that encoder-based models (BERT and T5) demonstrate greater sensitivity to prompt modifications than the decoder-only model (GPT-2), with varying impacts based on task and modification type. We reason that the complete bidirectional nature of the encoder self-attention mechanism causes models to overfit on subtle linguistic artifacts in the training data, reducing the ability to generalise to unseen examples. As such, we recommend that models used in production that deal with potentially unpredictable input (ie. client-facing applications), be trained on more diverse data to enhance model robustness. This can be obtained through manual collection or noise-based data augmentation such as the prompt modification techniques covered in this study. Future research is recommended to explore additional modification categories, tasks, and scalability effects across larger models.
author2	Wang Wenya
author_facet	Wang Wenya Ong, Li Han
format	Final Year Project
author	Ong, Li Han
author_sort	Ong, Li Han
title	Prompt sensitivity of transformer variants for text classification
title_short	Prompt sensitivity of transformer variants for text classification
title_full	Prompt sensitivity of transformer variants for text classification
title_fullStr	Prompt sensitivity of transformer variants for text classification
title_full_unstemmed	Prompt sensitivity of transformer variants for text classification
title_sort	prompt sensitivity of transformer variants for text classification
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/181519
_version_	1819112976951541760

Prompt sensitivity of transformer variants for text classification

Similar Items