Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics

Sentiment analysis or opinion mining is a task concerning identifying, extracting and quantifying the sentiment orientations or affective states. The task utilizes a synthesis of techniques like natural language processing, computational linguistics, text mining and so forth. Under its big umbrella,...

Full description

Saved in:

Bibliographic Details
Main Author:	Peng, Haiyun
Other Authors:	Erik Cambria
Format:	Theses and Dissertations
Language:	English
Published:	2019
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/84297 http://hdl.handle.net/10220/48173
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-84297
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Peng, Haiyun Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
description	Sentiment analysis or opinion mining is a task concerning identifying, extracting and quantifying the sentiment orientations or affective states. The task utilizes a synthesis of techniques like natural language processing, computational linguistics, text mining and so forth. Under its big umbrella, various sub-tasks exist, such as subjectivity detection, sentiment classification, named entity recognition, and sarcasm detection etc. Large quantities of research work that studied the aforementioned tasks were conducted on the English language, due to the popularity of English on the international platform and, thus, its abundance of language resource. Although this research could be applied to other Indo-European languages, they are deficient in performing on many oriental languages, especially on the Chinese language. This was caused by the specific characteristics of the Chinese language. Inspired by linguistics, this thesis discusses the situations and features that make the Chinese language different from English and proposes corresponding approaches to utilize these opportunities. In the beginning, we reviewed the literature on Chinese sentiment analysis research. Amongst which we noticed that existing Chinese sentiment resource was relatively scarce compared to other languages. This was reflected in two aspects: no semantic connection between words and missing sentiment intensity (fine-grained) measure. Thus, we proposed an unsupervised method to construct a semantic-connected valence Chinese sentiment resource. The mapping-based method leveraged on multiple multilingual and sentiment resources, such as WordNet etc. Next, we found that Chinese word segmentation could be a source of errors in sentiment analysis, especially in a non-general domain, such as finance or medical. In addition, we analyzed that intra-character components (radicals) of Chinese text carry semantics due to its origin of the pictogram (or ideogram). To this end, we proposed a radical-based hierarchical character embedding to skip the word segmentation step and also to inject intra-character semantics to the text representation. The new text representation outperformed word-level representation by a considerable margin in the sentiment classification task. When we tried to extend the hierarchical embedding to aspect-based sentiment analysis task, we realized that existing methods all tend to take the averaged embeddings of multi-word aspect target to represent the aspect target. This assumption will work in English on the condition that the proportion of multi-word aspect target is relatively low. However, almost all Chinese aspect targets are multi-character targets. Thus, we introduced an aspect target sequence modeling (ATSM) network to specifically learn adaptive aspect target representation based on sentence context and ATSM-Fusion network to consider the multi-granularity feature of Chinese text. The ATSM model alone achieved the state-of-the-art performance in English ABSA and ATSM-Fusion pushed the Chinese ABSA performance higher. In addition to addressing Chinese sentiment analysis from textual modality, we proposed to incorporate phonetic information for textual sentiment analysis. We introduce two effective features to encode phonetic information. Then, we developed a disambiguate intonation for sentiment analysis (DISA) network using a reinforcement network. It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned. Furthermore, we fused phonetic features with textual and visual features in order to mimic the way humans read and understand Chinese text. Experimental results show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations In summary, this thesis introduces several approaches to Chinese sentiment analysis, addressing and utilizing the linguistic characteristics (e.g., compositionality, multi-granularity, phonology) that distinguish Chinese from other languages.
author2	Erik Cambria
author_facet	Erik Cambria Peng, Haiyun
format	Theses and Dissertations
author	Peng, Haiyun
author_sort	Peng, Haiyun
title	Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_short	Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_full	Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_fullStr	Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_full_unstemmed	Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_sort	linguistic-inspired chinese sentiment analysis : from characters to radicals and phonetics
publishDate	2019
url	https://hdl.handle.net/10356/84297 http://hdl.handle.net/10220/48173
_version_	1681057862995935232
spelling	sg-ntu-dr.10356-842972020-07-01T05:17:27Z Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics Peng, Haiyun Erik Cambria School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Sentiment analysis or opinion mining is a task concerning identifying, extracting and quantifying the sentiment orientations or affective states. The task utilizes a synthesis of techniques like natural language processing, computational linguistics, text mining and so forth. Under its big umbrella, various sub-tasks exist, such as subjectivity detection, sentiment classification, named entity recognition, and sarcasm detection etc. Large quantities of research work that studied the aforementioned tasks were conducted on the English language, due to the popularity of English on the international platform and, thus, its abundance of language resource. Although this research could be applied to other Indo-European languages, they are deficient in performing on many oriental languages, especially on the Chinese language. This was caused by the specific characteristics of the Chinese language. Inspired by linguistics, this thesis discusses the situations and features that make the Chinese language different from English and proposes corresponding approaches to utilize these opportunities. In the beginning, we reviewed the literature on Chinese sentiment analysis research. Amongst which we noticed that existing Chinese sentiment resource was relatively scarce compared to other languages. This was reflected in two aspects: no semantic connection between words and missing sentiment intensity (fine-grained) measure. Thus, we proposed an unsupervised method to construct a semantic-connected valence Chinese sentiment resource. The mapping-based method leveraged on multiple multilingual and sentiment resources, such as WordNet etc. Next, we found that Chinese word segmentation could be a source of errors in sentiment analysis, especially in a non-general domain, such as finance or medical. In addition, we analyzed that intra-character components (radicals) of Chinese text carry semantics due to its origin of the pictogram (or ideogram). To this end, we proposed a radical-based hierarchical character embedding to skip the word segmentation step and also to inject intra-character semantics to the text representation. The new text representation outperformed word-level representation by a considerable margin in the sentiment classification task. When we tried to extend the hierarchical embedding to aspect-based sentiment analysis task, we realized that existing methods all tend to take the averaged embeddings of multi-word aspect target to represent the aspect target. This assumption will work in English on the condition that the proportion of multi-word aspect target is relatively low. However, almost all Chinese aspect targets are multi-character targets. Thus, we introduced an aspect target sequence modeling (ATSM) network to specifically learn adaptive aspect target representation based on sentence context and ATSM-Fusion network to consider the multi-granularity feature of Chinese text. The ATSM model alone achieved the state-of-the-art performance in English ABSA and ATSM-Fusion pushed the Chinese ABSA performance higher. In addition to addressing Chinese sentiment analysis from textual modality, we proposed to incorporate phonetic information for textual sentiment analysis. We introduce two effective features to encode phonetic information. Then, we developed a disambiguate intonation for sentiment analysis (DISA) network using a reinforcement network. It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned. Furthermore, we fused phonetic features with textual and visual features in order to mimic the way humans read and understand Chinese text. Experimental results show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations In summary, this thesis introduces several approaches to Chinese sentiment analysis, addressing and utilizing the linguistic characteristics (e.g., compositionality, multi-granularity, phonology) that distinguish Chinese from other languages. Doctor of Philosophy 2019-05-13T05:44:42Z 2019-12-06T15:42:20Z 2019-05-13T05:44:42Z 2019-12-06T15:42:20Z 2019 Thesis Peng, H. (2019). Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/84297 http://hdl.handle.net/10220/48173 10.32657/10220/48173 en 143 p. application/pdf

Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics

Similar Items