Distinguishing between authentic and fictitious user-generated hotel reviews

The objective of this paper is to distinguish between authentic and fictitious user-generated hotel reviews. To achieve this objective, it adopts a two-step approach. The first seeks to classify authentic and fictitious reviews by leveraging on their possible textual differences. The second step att...

Full description

Saved in:
Bibliographic Details
Main Authors: Banerjee, Snehasish, Chua, Alton Y. K., Jung-Jae Kim
Other Authors: Wee Kim Wee School of Communication and Information
Format: Conference or Workshop Item
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/82626
http://hdl.handle.net/10220/40089
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-82626
record_format dspace
spelling sg-ntu-dr.10356-826262020-03-07T12:15:48Z Distinguishing between authentic and fictitious user-generated hotel reviews Banerjee, Snehasish Chua, Alton Y. K. Jung-Jae Kim Wee Kim Wee School of Communication and Information 2015 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT) Classification algorithms Data mining Machine learning Text analysis The objective of this paper is to distinguish between authentic and fictitious user-generated hotel reviews. To achieve this objective, it adopts a two-step approach. The first seeks to classify authentic and fictitious reviews by leveraging on their possible textual differences. The second step attempts to identify the textual traits that are unique to authentic and fictitious reviews. For the purpose of this paper, a ground truth dataset of 1,800 reviews, uniformly divided between authentic and fictitious, was created. With respect to the first step, authentic and fictitious reviews were classified by using four forms of textual differences: understandability, level of details, writing style, and cognition indicators. Classification was performed using voting by average probability among logistic regression, C4.5, Support Vector Machine, JRip, and Random Forest classifiers. Using five-fold cross-validation, the proposed approach was found to outperform two existing baselines. Furthermore, with respect to the second step, the textual traits unique to authentic and fictitious reviews were identified using Information Gain, and Chi-squared feature selection techniques. A sequential forward feature selection approach was further adopted to identify the top five features that aid the classification of authentic and fictitious reviews. These include the use of nouns, articles, function words, punctuations, and in particular, exclamation points in reviews. The implications of the results are discussed. Accepted version 2016-02-24T03:26:04Z 2019-12-06T14:59:13Z 2016-02-24T03:26:04Z 2019-12-06T14:59:13Z 2015 Conference Paper Banerjee, S., Chua, A. Y. K., & Jung-Jae Kim. (2015). Distinguishing between authentic and fictitious user-generated hotel reviews. 2015 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 1-7. https://hdl.handle.net/10356/82626 http://hdl.handle.net/10220/40089 10.1109/ICCCNT.2015.7395179 en © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/ICCCNT.2015.7395179]. 7 P. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Classification algorithms
Data mining
Machine learning
Text analysis
spellingShingle Classification algorithms
Data mining
Machine learning
Text analysis
Banerjee, Snehasish
Chua, Alton Y. K.
Jung-Jae Kim
Distinguishing between authentic and fictitious user-generated hotel reviews
description The objective of this paper is to distinguish between authentic and fictitious user-generated hotel reviews. To achieve this objective, it adopts a two-step approach. The first seeks to classify authentic and fictitious reviews by leveraging on their possible textual differences. The second step attempts to identify the textual traits that are unique to authentic and fictitious reviews. For the purpose of this paper, a ground truth dataset of 1,800 reviews, uniformly divided between authentic and fictitious, was created. With respect to the first step, authentic and fictitious reviews were classified by using four forms of textual differences: understandability, level of details, writing style, and cognition indicators. Classification was performed using voting by average probability among logistic regression, C4.5, Support Vector Machine, JRip, and Random Forest classifiers. Using five-fold cross-validation, the proposed approach was found to outperform two existing baselines. Furthermore, with respect to the second step, the textual traits unique to authentic and fictitious reviews were identified using Information Gain, and Chi-squared feature selection techniques. A sequential forward feature selection approach was further adopted to identify the top five features that aid the classification of authentic and fictitious reviews. These include the use of nouns, articles, function words, punctuations, and in particular, exclamation points in reviews. The implications of the results are discussed.
author2 Wee Kim Wee School of Communication and Information
author_facet Wee Kim Wee School of Communication and Information
Banerjee, Snehasish
Chua, Alton Y. K.
Jung-Jae Kim
format Conference or Workshop Item
author Banerjee, Snehasish
Chua, Alton Y. K.
Jung-Jae Kim
author_sort Banerjee, Snehasish
title Distinguishing between authentic and fictitious user-generated hotel reviews
title_short Distinguishing between authentic and fictitious user-generated hotel reviews
title_full Distinguishing between authentic and fictitious user-generated hotel reviews
title_fullStr Distinguishing between authentic and fictitious user-generated hotel reviews
title_full_unstemmed Distinguishing between authentic and fictitious user-generated hotel reviews
title_sort distinguishing between authentic and fictitious user-generated hotel reviews
publishDate 2016
url https://hdl.handle.net/10356/82626
http://hdl.handle.net/10220/40089
_version_ 1681044291464462336