A SimCSE-based model for sentiment analysis in Chinese text messages

Existing sentiment analysis algorithms mainly focus on vectorized textual data representation and constructing high-quality deep learning classifiers. However, improving sentence embedding methods could enhance textual sentiment classification models. This project introduces a model for text-leve...

Full description

Saved in:
Bibliographic Details
Main Author: Song, Haiyang
Other Authors: -
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/177139
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Existing sentiment analysis algorithms mainly focus on vectorized textual data representation and constructing high-quality deep learning classifiers. However, improving sentence embedding methods could enhance textual sentiment classification models. This project introduces a model for text-level sentiment classification utilizing contrastive learning and BERT pre-trained language models. Model combines SimCSE with self-supervised BERT training using contrastive learning. It adapts a simple text level sentiment analysis dataset into pairs through Back Translation, constructing siamese network BERTs. Each side of these BERTs shares the same structure and parameters. By feeding sentiment analysis text pairs generated through Back Translation into the BERT models, sentence representation vectors are obtained. The model optimizes by summing loss functions and back-propagating to improve performance. Finally, the onesided BERT network from the trained siamese network BERTs is applied to the supervised classification module for Chinese text sentiment classification. Experimental validation on three Chinese datasets, including Waimai 10k, chnsenticorp htl All, and online shopping 10 cats, demonstrates the effectiveness and superiority of the model over several cutting-edge text-level sentiment classification models. Keywords: Natural language processing, Sentiment Analysis; Contrastive Learning; Siamese Network.