(Mis)leading the COVID-19 vaccination discourse on Twitter: an exploratory study of infodemic around the pandemic
In this work, we collect a moderate-sized representative corpus of tweets (over 200 000) pertaining to COVID-19 vaccination spanning for a period of seven months (September 2020–March 2021). Following a transfer learning approach, we utilize a pretrained transformer-based XLNet model to classify twe...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/170558 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In this work, we collect a moderate-sized representative corpus of tweets (over 200 000) pertaining to COVID-19 vaccination spanning for a period of seven months (September 2020–March 2021). Following a transfer learning approach, we utilize a pretrained transformer-based XLNet model to classify tweets as misleading or nonmisleading and manually validate the results with random subsets of samples. We leverage this to study and contrast the characteristics of tweets in the corpus that are misleading in nature against non-misleading ones. This exploratory analysis enables us to design features such as sentiments, hashtags, nouns, and pronouns which can, in turn, be exploited for classifying tweets as (non-)misleading using various machine learning (ML) models in an explainable manner. Specifically, several ML models are employed for prediction, with up to 90% accuracy, with the importance of each feature is explained using SHAP Explainable AI (XAI) tool. While the thrust of this work is principally exploratory in nature to obtain insight on the online discourse on COVID-19 vaccination, we conclude the article by outlining how these insights provide the foundations for a more actionable approach to mitigate misinformation. We have made the curated data as well as the accompanying code available so that the research community at large can reproduce, compare against, or build upon this work. |
---|