Tailored text augmentation for sentiment analysis

In synonym replacement-based data augmentation techniques for natural language processing tasks, words in a sentence are often sampled randomly with equal probability. In this paper, we propose a novel data augmentation technique named Tailored Text Argumentation (TTA) for sentiment analysis. It has...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Feng, Zijian, Zhou, Hanzhang, Zhu, Zixiao, Mao, Kezhi
مؤلفون آخرون:	School of Electrical and Electronic Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2022
الموضوعات:	Engineering::Electrical and electronic engineering Sentiment Analysis Text Augmentation
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/162087
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-162087
record_format	dspace
spelling	sg-ntu-dr.10356-1620872022-10-04T02:29:45Z Tailored text augmentation for sentiment analysis Feng, Zijian Zhou, Hanzhang Zhu, Zixiao Mao, Kezhi School of Electrical and Electronic Engineering Interdisciplinary Graduate School (IGS) Engineering::Electrical and electronic engineering Sentiment Analysis Text Augmentation In synonym replacement-based data augmentation techniques for natural language processing tasks, words in a sentence are often sampled randomly with equal probability. In this paper, we propose a novel data augmentation technique named Tailored Text Argumentation (TTA) for sentiment analysis. It has two main operations. The first operation is the probabilistic word sampling for synonym replacement based on the discriminative power and relevance of the word to sentiment. The second operation is the identification of words irrelevant to sentiment but discriminative for the training data, and application of zero masking or contextual replacement to these words. The first operation expands the coverage of discriminative words, while the second operation alleviates the problem of misfitting. Both operations tend to improve the model's generalization capability. Extensive experiments on simulated low-data regimes demonstrate that TTA yields notable improvements over six strong baselines. Finally, TTA is applied to public sentiment analysis on measures against Covid-19, which again proves the effectiveness of the new data augmentation algorithm. National Research Foundation (NRF) This work is an outcome of the Future Resilient Systems project at Singapore-ETH Centre (SEC) supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme. 2022-10-04T02:29:45Z 2022-10-04T02:29:45Z 2022 Journal Article Feng, Z., Zhou, H., Zhu, Z. & Mao, K. (2022). Tailored text augmentation for sentiment analysis. Expert Systems With Applications, 205, 117605-. https://dx.doi.org/10.1016/j.eswa.2022.117605 0957-4174 https://hdl.handle.net/10356/162087 10.1016/j.eswa.2022.117605 2-s2.0-85131964557 205 117605 en Expert Systems with Applications © 2022 Elsevier Ltd. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering Sentiment Analysis Text Augmentation
spellingShingle	Engineering::Electrical and electronic engineering Sentiment Analysis Text Augmentation Feng, Zijian Zhou, Hanzhang Zhu, Zixiao Mao, Kezhi Tailored text augmentation for sentiment analysis
description	In synonym replacement-based data augmentation techniques for natural language processing tasks, words in a sentence are often sampled randomly with equal probability. In this paper, we propose a novel data augmentation technique named Tailored Text Argumentation (TTA) for sentiment analysis. It has two main operations. The first operation is the probabilistic word sampling for synonym replacement based on the discriminative power and relevance of the word to sentiment. The second operation is the identification of words irrelevant to sentiment but discriminative for the training data, and application of zero masking or contextual replacement to these words. The first operation expands the coverage of discriminative words, while the second operation alleviates the problem of misfitting. Both operations tend to improve the model's generalization capability. Extensive experiments on simulated low-data regimes demonstrate that TTA yields notable improvements over six strong baselines. Finally, TTA is applied to public sentiment analysis on measures against Covid-19, which again proves the effectiveness of the new data augmentation algorithm.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Feng, Zijian Zhou, Hanzhang Zhu, Zixiao Mao, Kezhi
format	Article
author	Feng, Zijian Zhou, Hanzhang Zhu, Zixiao Mao, Kezhi
author_sort	Feng, Zijian
title	Tailored text augmentation for sentiment analysis
title_short	Tailored text augmentation for sentiment analysis
title_full	Tailored text augmentation for sentiment analysis
title_fullStr	Tailored text augmentation for sentiment analysis
title_full_unstemmed	Tailored text augmentation for sentiment analysis
title_sort	tailored text augmentation for sentiment analysis
publishDate	2022
url	https://hdl.handle.net/10356/162087
_version_	1746219677467541504

Tailored text augmentation for sentiment analysis

مواد مشابهة