Social media data mining : implementing a social media data mining pipeline for personality computing

Social Media has been thoroughly integrated into the many facets of societies across the world, churning out vast quantities of valuable data that hides a multitude of insights. In recent years, many novel techniques and methods have been brought to light and made mainstream through open-source r...

Full description

Saved in:
Bibliographic Details
Main Author: Chia, Aloysius
Other Authors: Ke Yiping, Kelly
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156552
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-156552
record_format dspace
spelling sg-ntu-dr.10356-1565522022-04-20T01:20:45Z Social media data mining : implementing a social media data mining pipeline for personality computing Chia, Aloysius Ke Yiping, Kelly School of Computer Science and Engineering ypke@ntu.edu.sg Engineering::Computer science and engineering Social Media has been thoroughly integrated into the many facets of societies across the world, churning out vast quantities of valuable data that hides a multitude of insights. In recent years, many novel techniques and methods have been brought to light and made mainstream through open-source repositories. These cutting-edge tools have allowed applicants of the technology to rapidly produce a multitude of applications that extract insights from social media data. Attempted here will be a social media data mining pipeline to perform automated personality assessment and evaluation. This pipeline consists of 5 stages in sequence; data collection, data transformation, data preprocessing, model execution and personality evaluation. To discover how best to implement each stage, exploratory analysis and experiments were conducted for familiarising with the materials and comparison’s sake respectively. Primary to the pipeline is an analysis and classification on social media users’ personalities through analysing their historical timeline laced with their opinions, comments, ideas, and interactions. Each tweet will be analysed for its sentiment, emotion, and personality traits. Consulting the big-five personality trait model, behavioural classification using pre-built models, transformers and Zero-Shot classification will be used. Additionally, the pipeline will be tested by feeding thousands of tweets collected from Twitter using API scraping methods. The pipeline was then later deployed onto a web application as a proof of concept (PoC) implemented using Streamlit which also includes various visualisations and options for customising the pipeline. Bachelor of Engineering (Computer Science) 2022-04-20T01:20:45Z 2022-04-20T01:20:45Z 2022 Final Year Project (FYP) Chia, A. (2022). Social media data mining : implementing a social media data mining pipeline for personality computing. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156552 https://hdl.handle.net/10356/156552 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Chia, Aloysius
Social media data mining : implementing a social media data mining pipeline for personality computing
description Social Media has been thoroughly integrated into the many facets of societies across the world, churning out vast quantities of valuable data that hides a multitude of insights. In recent years, many novel techniques and methods have been brought to light and made mainstream through open-source repositories. These cutting-edge tools have allowed applicants of the technology to rapidly produce a multitude of applications that extract insights from social media data. Attempted here will be a social media data mining pipeline to perform automated personality assessment and evaluation. This pipeline consists of 5 stages in sequence; data collection, data transformation, data preprocessing, model execution and personality evaluation. To discover how best to implement each stage, exploratory analysis and experiments were conducted for familiarising with the materials and comparison’s sake respectively. Primary to the pipeline is an analysis and classification on social media users’ personalities through analysing their historical timeline laced with their opinions, comments, ideas, and interactions. Each tweet will be analysed for its sentiment, emotion, and personality traits. Consulting the big-five personality trait model, behavioural classification using pre-built models, transformers and Zero-Shot classification will be used. Additionally, the pipeline will be tested by feeding thousands of tweets collected from Twitter using API scraping methods. The pipeline was then later deployed onto a web application as a proof of concept (PoC) implemented using Streamlit which also includes various visualisations and options for customising the pipeline.
author2 Ke Yiping, Kelly
author_facet Ke Yiping, Kelly
Chia, Aloysius
format Final Year Project
author Chia, Aloysius
author_sort Chia, Aloysius
title Social media data mining : implementing a social media data mining pipeline for personality computing
title_short Social media data mining : implementing a social media data mining pipeline for personality computing
title_full Social media data mining : implementing a social media data mining pipeline for personality computing
title_fullStr Social media data mining : implementing a social media data mining pipeline for personality computing
title_full_unstemmed Social media data mining : implementing a social media data mining pipeline for personality computing
title_sort social media data mining : implementing a social media data mining pipeline for personality computing
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/156552
_version_ 1731235768425250816