Automatic lexicon construction for domain-specific sentiment analysis : a frame-based approach

The need to gather useful information efficiently from a vast amount of online data has given rise to the interdisciplinary field of automated sentiment analysis. In general, sentiment analysis researchers agree that detecting implicit sentiments is more challenging than recognizing explicit ones. W...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Sang Sang
Other Authors: Na Jin Cheon
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/143088
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The need to gather useful information efficiently from a vast amount of online data has given rise to the interdisciplinary field of automated sentiment analysis. In general, sentiment analysis researchers agree that detecting implicit sentiments is more challenging than recognizing explicit ones. While the latter can be recognized more easily through the presence of sentiment-laden words, the former are conveyed indirectly, usually by describing a situation that invokes or can be interpreted as related to a specific sentiment. Attempts to deal with the challenge of implicit sentiments have leveraged sentiment lexicons containing phrasal entries capable of capturing subtle clues for inferring sentiments. The present research aims to establish an automated lexicon construction process to build phrasal sentiment lexicons by extracting regular phrasal patterns from domain-specific text corpora. The key challenge in detecting such patterns is that most phrasal units have low frequencies in text corpora and occur in different lexical forms even when they convey similar semantics. For instance, in beauty product reviews, ‘sting my face’ and ‘burn my skin’ are lexically different phrases that describe the same negative experience. Because of the existence of many phrasal variants, each with a low frequency, it is challenging to detect them from text corpora. This issue is related to the common language phenomenon of word sparsity. To address this problem, the present research set out to investigate the plausibility of a linguistically-motivated approach that uses semantic frames in FrameNet for frame-based semantic abstraction. The key idea is that, by creating abstract semantic representations for texts, semantically similar phrasal units represented by the same frames will be grouped together, creating a more compact and less sparse textual space that facilitates the detection of regular phrasal patterns. Two prerequisites need to be in place for this frame-based abstraction approach to work properly. First, the frame identification problem must be dealt with adequately. Frame identification is a common subtask in frame-semantic parsing. It disambiguates a marked target in a sentence (a word or a phrase) and associates it with the most likely frame in FrameNet. In this research, a robust frame identification model called Positional Attention-based Frame Identification with BERT (PAFIBERT) was developed, which outperformed existing solutions and was proven robust in handling unseen targets (i.e., out-of-dictionary targets not listed in FrameNet). The second prerequisite for frame-based semantic abstraction is related to the degree of abstraction provided by FrameNet’s frames. Since FrameNet was not initially created for sentiment-related tasks, some information that is essential for distinguishing positive opinions from negative ones is not encoded in the frames. The present research incorporated an appraisal concept called graduation into FrameNet to provide discriminative sentiment-related information, resulting in a sentiment-aware version of FrameNet known as GradFrameNet. Using GradFrameNet and PAFIBERT, this research demonstrated that the frame-based abstraction approach did not cause substantial loss of information but rather generalized in a sentiment-aware manner to extract valuable phrasal clues for sentiment analysis. With the above two requirements fulfilled, the final stage of this research established a domain-specific sentiment lexicon construction process that applied frame-based semantic abstraction to increase the coverage of phrasal lexicons. From the performance gain obtained by incorporating the lexicons into sentiment classifiers, it was found that the integration of semantic abstraction was beneficial to the lexicon construction process despite the noise introduced by the imperfect frame identification step. This work makes several original contributions. First, it contributes to the existing knowledge of corpus-based lexicon construction by addressing the issue of lexicon coverage and dealing specifically with the difficulty of extracting phrasal patterns from sparse textual corpora using a novel frame-based abstraction approach. As one of the first efforts to examine the potential of FrameNet in a rather novel setting and an early attempt to realize the graduation phenomenon on a large scale, this research enhances the understanding of how semantic frames can be combined with the graduation concept to provide abstract yet semantic-rich and sentiment-aware text representations. On a practical level, GradFrameNet and PAFIBERT also benefit the community of researchers and practitioners as useful linguistic resources or tools for sentiment analysis. In addition, the findings in this research shed light on the potential of the frame-based abstraction approach in confronting the fundamental problem of word sparsity, which is an inherent characteristic of textual data that poses additional challenges for various text mining tasks. Finally, since frame identification is a pivotal subtask in frame-semantic parsing, this research also makes an important contribution to the field by providing a model that achieved new state-of-the-art results.