Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons

This article introduces a new general-purpose sentiment lexicon called WKWSCI Sentiment Lexicon and compares it with five existing lexicons: Hu & Liu Opinion Lexicon, Multi-perspective Question Answering (MPQA) Subjectivity Lexicon, General Inquirer, National Research Council Canada (NRC) Word-S...

Full description

Saved in:
Bibliographic Details
Main Authors: Khoo, Christopher S. G., Johnkhan, Sathik Basha
Other Authors: Wee Kim Wee School of Communication and Information
Format: Article
Language:English
Published: 2017
Subjects:
Online Access:https://hdl.handle.net/10356/83570
http://hdl.handle.net/10220/42704
https://doi.org/10.21979/N9/DWWEBV
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-83570
record_format dspace
spelling sg-ntu-dr.10356-835702021-01-18T04:50:20Z Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons Khoo, Christopher S. G. Johnkhan, Sathik Basha Wee Kim Wee School of Communication and Information Sentiment analysis Sentiment categorisation This article introduces a new general-purpose sentiment lexicon called WKWSCI Sentiment Lexicon and compares it with five existing lexicons: Hu & Liu Opinion Lexicon, Multi-perspective Question Answering (MPQA) Subjectivity Lexicon, General Inquirer, National Research Council Canada (NRC) Word-Sentiment Association Lexicon and Semantic Orientation Calculator (SO-CAL) lexicon. The effectiveness of the sentiment lexicons for sentiment categorisation at the document level and sentence level was evaluated using an Amazon product review data set and a news headlines data set. WKWSCI, MPQA, Hu & Liu and SO-CAL lexicons are equally good for product review sentiment categorisation, obtaining accuracy rates of 75%–77% when appropriate weights are used for different categories of sentiment words. However, when a training corpus is not available, Hu & Liu obtained the best accuracy with a simple-minded approach of counting positive and negative words for both document-level and sentence-level sentiment categorisation. The WKWSCI lexicon obtained the best accuracy of 69% on the news headlines sentiment categorisation task, and the sentiment strength values obtained a Pearson correlation of 0.57 with human-assigned sentiment values. It is recommended that the Hu & Liu lexicon be used for product review texts and the WKWSCI lexicon for non-review texts. Accepted version 2017-06-14T07:38:22Z 2019-12-06T15:25:51Z 2017-06-14T07:38:22Z 2019-12-06T15:25:51Z 2017 Journal Article Khoo, C. S. G., & Johnkhan, S. B. (2017). Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons. Journal of Information Science, in press. 0165-5515 https://hdl.handle.net/10356/83570 http://hdl.handle.net/10220/42704 10.1177/0165551517703514 en Journal of Information Science https://doi.org/10.21979/N9/DWWEBV © 2017 The Author(s) (published by SAGE Publications). This is the author created version of a work that has been peer reviewed and accepted for publication in Journal of Information Science, published by SAGE Publications on behalf of the author(s). It incorporates referee’s comments but changes resulting from the publishing process, such as copyediting, structural formatting, may not be reflected in this document.  The published version is available at: [http://dx.doi.org/10.1177/0165551517703514]. 21 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Sentiment analysis
Sentiment categorisation
spellingShingle Sentiment analysis
Sentiment categorisation
Khoo, Christopher S. G.
Johnkhan, Sathik Basha
Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons
description This article introduces a new general-purpose sentiment lexicon called WKWSCI Sentiment Lexicon and compares it with five existing lexicons: Hu & Liu Opinion Lexicon, Multi-perspective Question Answering (MPQA) Subjectivity Lexicon, General Inquirer, National Research Council Canada (NRC) Word-Sentiment Association Lexicon and Semantic Orientation Calculator (SO-CAL) lexicon. The effectiveness of the sentiment lexicons for sentiment categorisation at the document level and sentence level was evaluated using an Amazon product review data set and a news headlines data set. WKWSCI, MPQA, Hu & Liu and SO-CAL lexicons are equally good for product review sentiment categorisation, obtaining accuracy rates of 75%–77% when appropriate weights are used for different categories of sentiment words. However, when a training corpus is not available, Hu & Liu obtained the best accuracy with a simple-minded approach of counting positive and negative words for both document-level and sentence-level sentiment categorisation. The WKWSCI lexicon obtained the best accuracy of 69% on the news headlines sentiment categorisation task, and the sentiment strength values obtained a Pearson correlation of 0.57 with human-assigned sentiment values. It is recommended that the Hu & Liu lexicon be used for product review texts and the WKWSCI lexicon for non-review texts.
author2 Wee Kim Wee School of Communication and Information
author_facet Wee Kim Wee School of Communication and Information
Khoo, Christopher S. G.
Johnkhan, Sathik Basha
format Article
author Khoo, Christopher S. G.
Johnkhan, Sathik Basha
author_sort Khoo, Christopher S. G.
title Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons
title_short Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons
title_full Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons
title_fullStr Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons
title_full_unstemmed Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons
title_sort lexicon-based sentiment analysis: comparative evaluation of six sentiment lexicons
publishDate 2017
url https://hdl.handle.net/10356/83570
http://hdl.handle.net/10220/42704
https://doi.org/10.21979/N9/DWWEBV
_version_ 1690658453387739136