Aspect discovery from product reviews

With the rapid development of online shopping sites and social media, product reviews are accumulating. These reviews contain information that is valuable to both businesses and customers. To businesses, companies can easily get a large number of feedback of their products, which is difficult to ach...

Full description

Saved in:
Bibliographic Details
Main Author: DING, Ying
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll_all/24
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1030&context=etd_coll_all
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll_all-1030
record_format dspace
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic opinion mining
topic models
deep learning
recommender systems
data mining
machine learning
Databases and Information Systems
spellingShingle opinion mining
topic models
deep learning
recommender systems
data mining
machine learning
Databases and Information Systems
DING, Ying
Aspect discovery from product reviews
description With the rapid development of online shopping sites and social media, product reviews are accumulating. These reviews contain information that is valuable to both businesses and customers. To businesses, companies can easily get a large number of feedback of their products, which is difficult to achieve by doing customer survey in the traditional way. To customers, they can know the products they are interested in better by reading reviews, which may be uneasy without online reviews. However, the accumulation has caused consuming all reviews impossible. It is necessary to develop automated techniques to efficiently process them. One of the most fundamental research problems related to product review analysis is aspect discovery. Aspects are components or attributes of a product or service. Aspect discovery is to find the relevant terms and then cluster them into aspects. As users often evaluate products based on aspects, presenting them with aspect level analysis is very necessary. Meanwhile, aspect discovery works as the basis of many downstream applications, such as aspect level opinion summarization, rating prediction, and product recommendation. There are three basic steps to go through for aspect discovery. The first one is about defining the aspects we need. In this step, we need to understand and determine what are considered aspects. The second one is about identifying words that are used to describe aspects. This step can help us concentrate on analyzing information that is most relevant to aspect discovery. The third one is about clustering words into aspects. The main goal of this step is to cluster words that are about the same aspect into the same group. There has been much work trying to do the three basic steps in different ways. However, there still exist some limitations with them. In the first step, most existing studies assume that they can discover aspects that people use to evaluate products. However, besides aspects, there also exist another type of latent topics in product reviews, which is named “properties” by us. Properties are attributes that are intrinsic to products, which are not suitable to be used to compare different products. In the second step, to identify aspect words, many supervised learning based models have been proposed. While proven to be effective, they require large amounts of training data and turn to be much less useful when applied to data from a different domain. To finish the third step, many extensions of LDA have been proposed for clustering aspect words. Most of them only rely on the co-occurrence statistics of words without considering the semantic meanings of words. In this dissertation, we try to propose several new models to deal with some remaining problems of existing work: 1. We propose a principled model to separate product properties from aspects and connect both of them with ratings. Our model can effectively do the separation and its output can help us understand users’ shopping behaviors and preferences better. 2. We design two Recurrent Neural Network (RNN) based models to incorporate domain independent rules into domain specific supervised learning based neural networks. Our models can improve a lot over some existing strong baselines in the task of cross-domain aspect word identification. 3. We use word embeddings to boost traditional topic modeling of product reviews. The proposed model is more effective in both discovering meaningful aspects and recommending products to users. 4. We propose a model integrating RNN with Neural Topic model (NTM) to jointly identify and cluster aspect words. Our model is able to discover clearer and more coherent aspects. It is also more effective in sentence clustering than the baselines.
format text
author DING, Ying
author_facet DING, Ying
author_sort DING, Ying
title Aspect discovery from product reviews
title_short Aspect discovery from product reviews
title_full Aspect discovery from product reviews
title_fullStr Aspect discovery from product reviews
title_full_unstemmed Aspect discovery from product reviews
title_sort aspect discovery from product reviews
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/etd_coll_all/24
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1030&context=etd_coll_all
_version_ 1712300801809973248
spelling sg-smu-ink.etd_coll_all-10302018-05-08T00:58:57Z Aspect discovery from product reviews DING, Ying With the rapid development of online shopping sites and social media, product reviews are accumulating. These reviews contain information that is valuable to both businesses and customers. To businesses, companies can easily get a large number of feedback of their products, which is difficult to achieve by doing customer survey in the traditional way. To customers, they can know the products they are interested in better by reading reviews, which may be uneasy without online reviews. However, the accumulation has caused consuming all reviews impossible. It is necessary to develop automated techniques to efficiently process them. One of the most fundamental research problems related to product review analysis is aspect discovery. Aspects are components or attributes of a product or service. Aspect discovery is to find the relevant terms and then cluster them into aspects. As users often evaluate products based on aspects, presenting them with aspect level analysis is very necessary. Meanwhile, aspect discovery works as the basis of many downstream applications, such as aspect level opinion summarization, rating prediction, and product recommendation. There are three basic steps to go through for aspect discovery. The first one is about defining the aspects we need. In this step, we need to understand and determine what are considered aspects. The second one is about identifying words that are used to describe aspects. This step can help us concentrate on analyzing information that is most relevant to aspect discovery. The third one is about clustering words into aspects. The main goal of this step is to cluster words that are about the same aspect into the same group. There has been much work trying to do the three basic steps in different ways. However, there still exist some limitations with them. In the first step, most existing studies assume that they can discover aspects that people use to evaluate products. However, besides aspects, there also exist another type of latent topics in product reviews, which is named “properties” by us. Properties are attributes that are intrinsic to products, which are not suitable to be used to compare different products. In the second step, to identify aspect words, many supervised learning based models have been proposed. While proven to be effective, they require large amounts of training data and turn to be much less useful when applied to data from a different domain. To finish the third step, many extensions of LDA have been proposed for clustering aspect words. Most of them only rely on the co-occurrence statistics of words without considering the semantic meanings of words. In this dissertation, we try to propose several new models to deal with some remaining problems of existing work: 1. We propose a principled model to separate product properties from aspects and connect both of them with ratings. Our model can effectively do the separation and its output can help us understand users’ shopping behaviors and preferences better. 2. We design two Recurrent Neural Network (RNN) based models to incorporate domain independent rules into domain specific supervised learning based neural networks. Our models can improve a lot over some existing strong baselines in the task of cross-domain aspect word identification. 3. We use word embeddings to boost traditional topic modeling of product reviews. The proposed model is more effective in both discovering meaningful aspects and recommending products to users. 4. We propose a model integrating RNN with Neural Topic model (NTM) to jointly identify and cluster aspect words. Our model is able to discover clearer and more coherent aspects. It is also more effective in sentence clustering than the baselines. 2017-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll_all/24 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1030&context=etd_coll_all http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection eng Institutional Knowledge at Singapore Management University opinion mining topic models deep learning recommender systems data mining machine learning Databases and Information Systems