Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a comple...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG, Yazhe, CALLAN, Jamie, ZHENG, Baihua
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2866
https://ink.library.smu.edu.sg/context/sis_research/article/3866/viewcontent/Should_we_use_sample_pv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3866
record_format dspace
spelling sg-smu-ink.sis_research-38662020-03-27T02:17:23Z Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API WANG, Yazhe CALLAN, Jamie ZHENG, Baihua Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill this gap, this article performs a comparative analysis on samples obtained from two of Twitter’s streaming APIs with a more complete Twitter dataset to gain an in-depth understanding of the nature of Twitter data samples and their potential for use in various data mining tasks. 2015-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2866 info:doi/10.1145/2746366 https://ink.library.smu.edu.sg/context/sis_research/article/3866/viewcontent/Should_we_use_sample_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Twitter API sample data mining Databases and Information Systems Numerical Analysis and Scientific Computing Social Media
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Twitter API
sample
data mining
Databases and Information Systems
Numerical Analysis and Scientific Computing
Social Media
spellingShingle Twitter API
sample
data mining
Databases and Information Systems
Numerical Analysis and Scientific Computing
Social Media
WANG, Yazhe
CALLAN, Jamie
ZHENG, Baihua
Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API
description Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill this gap, this article performs a comparative analysis on samples obtained from two of Twitter’s streaming APIs with a more complete Twitter dataset to gain an in-depth understanding of the nature of Twitter data samples and their potential for use in various data mining tasks.
format text
author WANG, Yazhe
CALLAN, Jamie
ZHENG, Baihua
author_facet WANG, Yazhe
CALLAN, Jamie
ZHENG, Baihua
author_sort WANG, Yazhe
title Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API
title_short Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API
title_full Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API
title_fullStr Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API
title_full_unstemmed Should We Use the Sample? Analyzing Datasets Sampled from Twitter's Stream API
title_sort should we use the sample? analyzing datasets sampled from twitter's stream api
publisher Institutional Knowledge at Singapore Management University
publishDate 2015
url https://ink.library.smu.edu.sg/sis_research/2866
https://ink.library.smu.edu.sg/context/sis_research/article/3866/viewcontent/Should_we_use_sample_pv.pdf
_version_ 1770572658802425856