From classification to quantification in tweet sentiment analysis

entiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, mar...

Full description

Saved in:
Bibliographic Details
Main Authors: GAO, Wei, SEBASTIANI, Fabrizio
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2016
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4547
https://ink.library.smu.edu.sg/context/sis_research/article/5550/viewcontent/classification_quantification_tweet.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5550
record_format dspace
spelling sg-smu-ink.sis_research-55502019-12-26T09:05:17Z From classification to quantification in tweet sentiment analysis GAO, Wei SEBASTIANI, Fabrizio entiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures. This is an extended version of a paper with the title “Tweet Sentiment: From Classification to Quantification” which appears in the Proceedings of the 6th ACM/IEEE International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2015). 2016-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4547 info:doi/10.1007/s13278-016-0327-z https://ink.library.smu.edu.sg/context/sis_research/article/5550/viewcontent/classification_quantification_tweet.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
GAO, Wei
SEBASTIANI, Fabrizio
From classification to quantification in tweet sentiment analysis
description entiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures. This is an extended version of a paper with the title “Tweet Sentiment: From Classification to Quantification” which appears in the Proceedings of the 6th ACM/IEEE International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2015).
format text
author GAO, Wei
SEBASTIANI, Fabrizio
author_facet GAO, Wei
SEBASTIANI, Fabrizio
author_sort GAO, Wei
title From classification to quantification in tweet sentiment analysis
title_short From classification to quantification in tweet sentiment analysis
title_full From classification to quantification in tweet sentiment analysis
title_fullStr From classification to quantification in tweet sentiment analysis
title_full_unstemmed From classification to quantification in tweet sentiment analysis
title_sort from classification to quantification in tweet sentiment analysis
publisher Institutional Knowledge at Singapore Management University
publishDate 2016
url https://ink.library.smu.edu.sg/sis_research/4547
https://ink.library.smu.edu.sg/context/sis_research/article/5550/viewcontent/classification_quantification_tweet.pdf
_version_ 1770574910590025728