Comparison mining from text

Online product reviews are important factors of consumers' purchase decisions. They invade more and more spheres of our life, we have reviews on books, electronics, groceries, entertainments, restaurants, travel experiences, etc. More than 90 percent of consumers read online reviews before they...

Full description

Saved in:
Bibliographic Details
Main Author: TKACHENKO, Maksim
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/161
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1161&context=etd_coll
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1161
record_format dspace
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic comparisons
graphical models
natural language processing
text mining
Databases and Information Systems
Software Engineering
spellingShingle comparisons
graphical models
natural language processing
text mining
Databases and Information Systems
Software Engineering
TKACHENKO, Maksim
Comparison mining from text
description Online product reviews are important factors of consumers' purchase decisions. They invade more and more spheres of our life, we have reviews on books, electronics, groceries, entertainments, restaurants, travel experiences, etc. More than 90 percent of consumers read online reviews before they purchase products as reported by various consumers surveys. This observation suggests that product review information enhances consumer experience and helps them to make better-informed purchase decisions. There is an enormous amount of online reviews posted on e-commerce platforms, such as Amazon, Apple, Yelp, TripAdvisor. They vary in information and may be written with different experiences and preferences. If online opinions are indeed important in many spheres of our lives, then their systematic analysis is a real-life problem. Due to an enormous amount of opinions scattered across the Web, a handcrafted analysis seems to carry an inadmissible cost of time and efforts. An alternative to consider is an automated or, more appropriately, semi-automated analysis conducted by computers as an assistance to human analysts. Text processing applications have received much attention in the past three decades and have been shown successful for language understanding. Comparison mining aims at understanding opinion mining problems when multiple entities are present simultaneously. This includes, but not limited to deriving similarities and differences between entities and discovering information about the entity relations. The entities may be products, individuals, issues, etc. The notion of comparison tangles in in a form of joint evaluative statements, such as "I think A is better than B", "I think A is a good alternative to B", and introduces new research questions, similar and yet different from traditional opinion mining. How do we find these statements in a review? How do we interpret these statements? How do we make sense of thousands of such comparisons? In this study, we seek to answer these questions and propose a set of related computational solutions. First, we investigate a comparison identification problem and cast it as a relation extraction problem. Within the relation extraction setup, we develop a new approach for identifying comparative relations. The formal investigation of the syntactic structure of comparative statements leads us to a kernel-based approach, which relies on the dependency structure of sentences. The proposed method shows state-of-the-art results for the comparison identification problem. Second, we explore intrinsic properties of a comparative corpus to derive a joint model for comparison interpretation and aggregation. At the level of comparisons, the model seeks to derive the comparison outcome of a statement, i.e., which entity is preferred by the writer. At the aggregated level, it seeks to understand the overall ranking of the entities in a corpus of comparisons. The proposed model is shown to be superior to the approaches that tackle each level separately. An empirical evaluation demonstrates its effectiveness on real-world datasets. Third, we look at the phenomenon of comparison disagreement, i.e., different users may have different preferences over the same set of entities. To capture this diversity, we propose a model for preference clustering and demonstrate its effectiveness and utility. Fourth, we propose a method for explaining entity comparisons, when entities are identified by their textual representations. CompareLDA, a supervised topic model, is employed to align topics, distributions of co-occurring words, with comparisons, so that the topics are indicative of the "better" and "worse" entities. Through an empirical evaluation, we show that the proposed model is more effective for capturing comparisons than alternative supervised topic models. All the proposed methods form substantial contribution within the comparison mining research and facilitate a better understanding of the opinion language.
format text
author TKACHENKO, Maksim
author_facet TKACHENKO, Maksim
author_sort TKACHENKO, Maksim
title Comparison mining from text
title_short Comparison mining from text
title_full Comparison mining from text
title_fullStr Comparison mining from text
title_full_unstemmed Comparison mining from text
title_sort comparison mining from text
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/etd_coll/161
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1161&context=etd_coll
_version_ 1712300914501484544
spelling sg-smu-ink.etd_coll-11612019-04-10T03:25:51Z Comparison mining from text TKACHENKO, Maksim Online product reviews are important factors of consumers' purchase decisions. They invade more and more spheres of our life, we have reviews on books, electronics, groceries, entertainments, restaurants, travel experiences, etc. More than 90 percent of consumers read online reviews before they purchase products as reported by various consumers surveys. This observation suggests that product review information enhances consumer experience and helps them to make better-informed purchase decisions. There is an enormous amount of online reviews posted on e-commerce platforms, such as Amazon, Apple, Yelp, TripAdvisor. They vary in information and may be written with different experiences and preferences. If online opinions are indeed important in many spheres of our lives, then their systematic analysis is a real-life problem. Due to an enormous amount of opinions scattered across the Web, a handcrafted analysis seems to carry an inadmissible cost of time and efforts. An alternative to consider is an automated or, more appropriately, semi-automated analysis conducted by computers as an assistance to human analysts. Text processing applications have received much attention in the past three decades and have been shown successful for language understanding. Comparison mining aims at understanding opinion mining problems when multiple entities are present simultaneously. This includes, but not limited to deriving similarities and differences between entities and discovering information about the entity relations. The entities may be products, individuals, issues, etc. The notion of comparison tangles in in a form of joint evaluative statements, such as "I think A is better than B", "I think A is a good alternative to B", and introduces new research questions, similar and yet different from traditional opinion mining. How do we find these statements in a review? How do we interpret these statements? How do we make sense of thousands of such comparisons? In this study, we seek to answer these questions and propose a set of related computational solutions. First, we investigate a comparison identification problem and cast it as a relation extraction problem. Within the relation extraction setup, we develop a new approach for identifying comparative relations. The formal investigation of the syntactic structure of comparative statements leads us to a kernel-based approach, which relies on the dependency structure of sentences. The proposed method shows state-of-the-art results for the comparison identification problem. Second, we explore intrinsic properties of a comparative corpus to derive a joint model for comparison interpretation and aggregation. At the level of comparisons, the model seeks to derive the comparison outcome of a statement, i.e., which entity is preferred by the writer. At the aggregated level, it seeks to understand the overall ranking of the entities in a corpus of comparisons. The proposed model is shown to be superior to the approaches that tackle each level separately. An empirical evaluation demonstrates its effectiveness on real-world datasets. Third, we look at the phenomenon of comparison disagreement, i.e., different users may have different preferences over the same set of entities. To capture this diversity, we propose a model for preference clustering and demonstrate its effectiveness and utility. Fourth, we propose a method for explaining entity comparisons, when entities are identified by their textual representations. CompareLDA, a supervised topic model, is employed to align topics, distributions of co-occurring words, with comparisons, so that the topics are indicative of the "better" and "worse" entities. Through an empirical evaluation, we show that the proposed model is more effective for capturing comparisons than alternative supervised topic models. All the proposed methods form substantial contribution within the comparison mining research and facilitate a better understanding of the opinion language. 2018-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/161 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1161&context=etd_coll http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University comparisons graphical models natural language processing text mining Databases and Information Systems Software Engineering