Tweet sentiment: From classification to quantification
Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, ma...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2015
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/4574 https://ink.library.smu.edu.sg/context/sis_research/article/5577/viewcontent/p97_gao.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-5577 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-55772019-12-26T08:16:49Z Tweet sentiment: From classification to quantification GAO, Wei SEBASTIANI, Fabrizio Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. "prevalence") of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper we show, on a multiplicity of TSC datasets, that using a quantification-specific algorithm produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures. 2015-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4574 info:doi/10.1145/2808797.2809327 https://ink.library.smu.edu.sg/context/sis_research/article/5577/viewcontent/p97_gao.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems |
spellingShingle |
Databases and Information Systems GAO, Wei SEBASTIANI, Fabrizio Tweet sentiment: From classification to quantification |
description |
Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. "prevalence") of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper we show, on a multiplicity of TSC datasets, that using a quantification-specific algorithm produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures. |
format |
text |
author |
GAO, Wei SEBASTIANI, Fabrizio |
author_facet |
GAO, Wei SEBASTIANI, Fabrizio |
author_sort |
GAO, Wei |
title |
Tweet sentiment: From classification to quantification |
title_short |
Tweet sentiment: From classification to quantification |
title_full |
Tweet sentiment: From classification to quantification |
title_fullStr |
Tweet sentiment: From classification to quantification |
title_full_unstemmed |
Tweet sentiment: From classification to quantification |
title_sort |
tweet sentiment: from classification to quantification |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2015 |
url |
https://ink.library.smu.edu.sg/sis_research/4574 https://ink.library.smu.edu.sg/context/sis_research/article/5577/viewcontent/p97_gao.pdf |
_version_ |
1770574918347390976 |