Topic extraction and sentiment analysis of a subreddit (r/coronavirus)

Human emotion and individual opinion are subjective information that greatly affect how humans behave and interact [1]. Textual information such as online posting is one such way of expressing what a person is thinking and feeling. The coronavirus (COVID-19) pandemic has spread its roots globally s...

Full description

Saved in:
Bibliographic Details
Main Author: Chong, You Min
Other Authors: Anwitaman Datta
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153247
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Human emotion and individual opinion are subjective information that greatly affect how humans behave and interact [1]. Textual information such as online posting is one such way of expressing what a person is thinking and feeling. The coronavirus (COVID-19) pandemic has spread its roots globally since the first outbreak in early 2020, and the first global crisis since SARS in 2002. COVID-19 has negatively impacted the world in more ways than one and has completely changed the way lives are being led. In this study, the objective is to explore and perform analysis on the subreddit /r/Coronavirus, to observe and visualize the trends in which covid-related topics are being discussed. This is implemented with the use of Reddit’s API for data collection, MySQL for database management, and Python to present the findings. NLP techniques were applied during data analysis, including text pre-processing, topic modelling, and sentiment analysis. In addition, various libraries were utilized to carry out the aforementioned NLP techniques. The results showed that some negative sentiments were present among topics discussed, and vaccines were also commonly mentioned as a key topic. Further application of these results may be implemented to improve the ways in which topics are being identified and interpreted.