Topic extraction and sentiment analysis of subreddit - /r/Singapore

An essential part of understanding how humans interact with one another linked with their respective personalities has always been through finding out what they are thinking about. To detect subjective information such as attitudes, opinions, tone, expression etc, sentiment analysis is used to ana...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Shaun Wei Min
Other Authors: Anwitaman Datta
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137984
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-137984
record_format dspace
spelling sg-ntu-dr.10356-1379842020-04-21T04:07:54Z Topic extraction and sentiment analysis of subreddit - /r/Singapore Lim, Shaun Wei Min Anwitaman Datta School of Computer Science and Engineering Anwitaman@ntu.edu.sg Engineering::Computer science and engineering An essential part of understanding how humans interact with one another linked with their respective personalities has always been through finding out what they are thinking about. To detect subjective information such as attitudes, opinions, tone, expression etc, sentiment analysis is used to analyze these data. With the rise of social media usage, the importance of sentiment analysis increases as well. Data scientists tend to seek out the opinions of others to detect feelings based on specific events or occurrences due to the ever-expanding importance of improving business and society in the 21st century. The views of users are centered among interactions and activities with one another, which are critical influencers of our behavior. The purpose of this project is to investigate the sentiments of users’ comments in Singapore subreddit on a daily basis, plotted on an interactive dashboard that allows researchers to view the public’s sentiments for a particular day. This is achieved using Reddit web APIs, MySQL database and Chart.js plotting library. The sentiment analysis is done on the backend, which consists of NLP cleaning methods and NLTK Vadar Sentiment Analyzer. Thereafter, the paper focused on using users’ comments to generate new unseen text prior to retrieving their sentiment values. This is achieved by training the model using GPT-2 and Markov Chain. The final result shows that GPT-2 has a better result in generating new comments based on the user’s way of talking and his sentiments. These generated data can be used as fake reviews, comments etc. in the online world. Bachelor of Engineering (Computer Science) 2020-04-21T04:07:54Z 2020-04-21T04:07:54Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137984 en SCSE19-0197 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Lim, Shaun Wei Min
Topic extraction and sentiment analysis of subreddit - /r/Singapore
description An essential part of understanding how humans interact with one another linked with their respective personalities has always been through finding out what they are thinking about. To detect subjective information such as attitudes, opinions, tone, expression etc, sentiment analysis is used to analyze these data. With the rise of social media usage, the importance of sentiment analysis increases as well. Data scientists tend to seek out the opinions of others to detect feelings based on specific events or occurrences due to the ever-expanding importance of improving business and society in the 21st century. The views of users are centered among interactions and activities with one another, which are critical influencers of our behavior. The purpose of this project is to investigate the sentiments of users’ comments in Singapore subreddit on a daily basis, plotted on an interactive dashboard that allows researchers to view the public’s sentiments for a particular day. This is achieved using Reddit web APIs, MySQL database and Chart.js plotting library. The sentiment analysis is done on the backend, which consists of NLP cleaning methods and NLTK Vadar Sentiment Analyzer. Thereafter, the paper focused on using users’ comments to generate new unseen text prior to retrieving their sentiment values. This is achieved by training the model using GPT-2 and Markov Chain. The final result shows that GPT-2 has a better result in generating new comments based on the user’s way of talking and his sentiments. These generated data can be used as fake reviews, comments etc. in the online world.
author2 Anwitaman Datta
author_facet Anwitaman Datta
Lim, Shaun Wei Min
format Final Year Project
author Lim, Shaun Wei Min
author_sort Lim, Shaun Wei Min
title Topic extraction and sentiment analysis of subreddit - /r/Singapore
title_short Topic extraction and sentiment analysis of subreddit - /r/Singapore
title_full Topic extraction and sentiment analysis of subreddit - /r/Singapore
title_fullStr Topic extraction and sentiment analysis of subreddit - /r/Singapore
title_full_unstemmed Topic extraction and sentiment analysis of subreddit - /r/Singapore
title_sort topic extraction and sentiment analysis of subreddit - /r/singapore
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/137984
_version_ 1681059298817343488