Event detection from social media on COVID-19

Event detection has been one of the most important research topics in social media analysis this decade due to the widespread availability of rich data generated by social media platforms. These platforms have become a major source of information describing real-world and trending events. However, m...

Full description

Saved in:
Bibliographic Details
Main Author: Ho, Yin Wee
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156483
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-156483
record_format dspace
spelling sg-ntu-dr.10356-1564832022-04-17T12:22:39Z Event detection from social media on COVID-19 Ho, Yin Wee Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing Event detection has been one of the most important research topics in social media analysis this decade due to the widespread availability of rich data generated by social media platforms. These platforms have become a major source of information describing real-world and trending events. However, major challenges are faced in detecting events due to the dynamic nature and high volume of data production in social media streams. Previously, most works were either applicable to detect breaking news or localised events, only to overlook on other significant events. Furthermore, these works were focused on processing Twitter data and the same techniques cannot be directly adopted for Facebook data. In this project, we implemented an event detection system based on word embeddings, adapted for detecting events in our Facebook dataset. This system is comprised of 1) Stream Splitter, 2) Word Embedder and Document Clustering (within individual time windows), 3) Document Clustering (across all time windows) and 4) Event Summarisation. In 1), we first performed some natural language processing on our data before splitting them into separate time windows. Next, we embedded our documents with 3 different models: Skip-gram, TF-IDF and GloVe, and clustered the documents within their individual time windows using a modified version of the Jarvis-Patrick clustering algorithm. Document similarity was determined by finding the cosine similarity score of any pair of documents and placing them in the same event cluster if their score was above a certain threshold. In 3), we applied the same techniques used in the previous component but now we clustered the event clusters across the entire time frame. Finally, the last component extracted a representative post, as well as the top 5 most frequent occurring words, that describes the event cluster. After tuning the hyperparameters to obtain the best possible set of results for each model, we found out that TF-IDF produced the highest quality events but was only able to detect a moderate number of events. In contrast, Skip-gram and GloVe were able to produce more events with slightly lower quality but more work is needed to filter out events that are not as significant. Finally, we also tracked the development of some sample topics over time and the public’s reactions to them. These insights can help to qualify the public’s perception of certain topics which can aid in shaping the authorities’ approach when introducing them to the public. Bachelor of Engineering (Computer Science) 2022-04-17T12:22:39Z 2022-04-17T12:22:39Z 2022 Final Year Project (FYP) Ho, Y. W. (2022). Event detection from social media on COVID-19. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156483 https://hdl.handle.net/10356/156483 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Ho, Yin Wee
Event detection from social media on COVID-19
description Event detection has been one of the most important research topics in social media analysis this decade due to the widespread availability of rich data generated by social media platforms. These platforms have become a major source of information describing real-world and trending events. However, major challenges are faced in detecting events due to the dynamic nature and high volume of data production in social media streams. Previously, most works were either applicable to detect breaking news or localised events, only to overlook on other significant events. Furthermore, these works were focused on processing Twitter data and the same techniques cannot be directly adopted for Facebook data. In this project, we implemented an event detection system based on word embeddings, adapted for detecting events in our Facebook dataset. This system is comprised of 1) Stream Splitter, 2) Word Embedder and Document Clustering (within individual time windows), 3) Document Clustering (across all time windows) and 4) Event Summarisation. In 1), we first performed some natural language processing on our data before splitting them into separate time windows. Next, we embedded our documents with 3 different models: Skip-gram, TF-IDF and GloVe, and clustered the documents within their individual time windows using a modified version of the Jarvis-Patrick clustering algorithm. Document similarity was determined by finding the cosine similarity score of any pair of documents and placing them in the same event cluster if their score was above a certain threshold. In 3), we applied the same techniques used in the previous component but now we clustered the event clusters across the entire time frame. Finally, the last component extracted a representative post, as well as the top 5 most frequent occurring words, that describes the event cluster. After tuning the hyperparameters to obtain the best possible set of results for each model, we found out that TF-IDF produced the highest quality events but was only able to detect a moderate number of events. In contrast, Skip-gram and GloVe were able to produce more events with slightly lower quality but more work is needed to filter out events that are not as significant. Finally, we also tracked the development of some sample topics over time and the public’s reactions to them. These insights can help to qualify the public’s perception of certain topics which can aid in shaping the authorities’ approach when introducing them to the public.
author2 Sun Aixin
author_facet Sun Aixin
Ho, Yin Wee
format Final Year Project
author Ho, Yin Wee
author_sort Ho, Yin Wee
title Event detection from social media on COVID-19
title_short Event detection from social media on COVID-19
title_full Event detection from social media on COVID-19
title_fullStr Event detection from social media on COVID-19
title_full_unstemmed Event detection from social media on COVID-19
title_sort event detection from social media on covid-19
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/156483
_version_ 1731235744628867072