Using unsupervised techniques and manual analysis: A framework for discovering themes from social media posts

Given the role of social media in the modern society, it is imperative that the data from these sources be organized in order for them to be properly utilized. Hence, current technologies rely on supervised learning approaches that require the development of training data. However, for these trainin...

Full description

Saved in:
Bibliographic Details
Main Author: Syliongka, Leif Romeritch L.
Format: text
Language:English
Published: Animo Repository 2014
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/4623
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-11461
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-114612021-01-28T01:00:26Z Using unsupervised techniques and manual analysis: A framework for discovering themes from social media posts Syliongka, Leif Romeritch L. Given the role of social media in the modern society, it is imperative that the data from these sources be organized in order for them to be properly utilized. Hence, current technologies rely on supervised learning approaches that require the development of training data. However, for these training data to be useful, accurate or expert knowledge is often required. As alternative to manual approaches, which are impractical and uneconomical, social scientists utilize Natural Language Processing (NLP) as guide in order to derive themes from the dataset. However, these automatic approaches are either biased to frequently occurring terms or do not provide enough information in order to aid experts. Given these constraints, a framework that combines unsupervised methods and a manual means for topic extraction is presented. For this research, the data gathered from related researches (Meier, 2012a Meier, 2012b Pablo, Oco, Cheng, Roldan, & Roxas, 2014) are first preprocessed and represented using the bag-of-words representation and TF-IDF weighting scheme. Then the entire data undergoes feature reduction in order to reduce the length of the vector space. Next, k-means clustering (k = 3, 5 and 8) is used in order to organize the data in categories. It has been observed that silhouette coefficient of the clusters indicate that the clustering is suffering from high dimensionality of the features. Furthermore, due to the unlabeled nature of the unsupervised methods, content analysis using open coding is performed. Evaluation of the assigned labels yielded accuracy rate of 41.5% agreement rate while analysis of the results show different types of cluster behaviors (1) multi-clustered theme (2) consistent clusters (3) multi-topic clusters (4) language clusters (5) dispersing cluster. As future work, an improved preprocessing technique could be used for the clustering as well as exploring other value for k. 2014-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4623 Master's Theses English Animo Repository
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
description Given the role of social media in the modern society, it is imperative that the data from these sources be organized in order for them to be properly utilized. Hence, current technologies rely on supervised learning approaches that require the development of training data. However, for these training data to be useful, accurate or expert knowledge is often required. As alternative to manual approaches, which are impractical and uneconomical, social scientists utilize Natural Language Processing (NLP) as guide in order to derive themes from the dataset. However, these automatic approaches are either biased to frequently occurring terms or do not provide enough information in order to aid experts. Given these constraints, a framework that combines unsupervised methods and a manual means for topic extraction is presented. For this research, the data gathered from related researches (Meier, 2012a Meier, 2012b Pablo, Oco, Cheng, Roldan, & Roxas, 2014) are first preprocessed and represented using the bag-of-words representation and TF-IDF weighting scheme. Then the entire data undergoes feature reduction in order to reduce the length of the vector space. Next, k-means clustering (k = 3, 5 and 8) is used in order to organize the data in categories. It has been observed that silhouette coefficient of the clusters indicate that the clustering is suffering from high dimensionality of the features. Furthermore, due to the unlabeled nature of the unsupervised methods, content analysis using open coding is performed. Evaluation of the assigned labels yielded accuracy rate of 41.5% agreement rate while analysis of the results show different types of cluster behaviors (1) multi-clustered theme (2) consistent clusters (3) multi-topic clusters (4) language clusters (5) dispersing cluster. As future work, an improved preprocessing technique could be used for the clustering as well as exploring other value for k.
format text
author Syliongka, Leif Romeritch L.
spellingShingle Syliongka, Leif Romeritch L.
Using unsupervised techniques and manual analysis: A framework for discovering themes from social media posts
author_facet Syliongka, Leif Romeritch L.
author_sort Syliongka, Leif Romeritch L.
title Using unsupervised techniques and manual analysis: A framework for discovering themes from social media posts
title_short Using unsupervised techniques and manual analysis: A framework for discovering themes from social media posts
title_full Using unsupervised techniques and manual analysis: A framework for discovering themes from social media posts
title_fullStr Using unsupervised techniques and manual analysis: A framework for discovering themes from social media posts
title_full_unstemmed Using unsupervised techniques and manual analysis: A framework for discovering themes from social media posts
title_sort using unsupervised techniques and manual analysis: a framework for discovering themes from social media posts
publisher Animo Repository
publishDate 2014
url https://animorepository.dlsu.edu.ph/etd_masteral/4623
_version_ 1800918827350884352