Fake news detection using social media data

Along with the large transition to the online social media market, a large number of "Fake News", i.e., articles that purposefully contain false information, are being spread across the network[1]. Fake news can be produced for many purposes, such as financial or political gain, and can...

Full description

Saved in:

Bibliographic Details
Main Author:	Widjaja, Elbert
Other Authors:	Ke Yiping, Kelly
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/147544
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-147544
record_format	dspace
spelling	sg-ntu-dr.10356-1475442021-04-06T08:25:37Z Fake news detection using social media data Widjaja, Elbert Ke Yiping, Kelly School of Computer Science and Engineering ypke@ntu.edu.sg Engineering::Computer science and engineering Along with the large transition to the online social media market, a large number of "Fake News", i.e., articles that purposefully contain false information, are being spread across the network[1]. Fake news can be produced for many purposes, such as financial or political gain, and can have a negative impact on society. Therefore, to mitigate the negative impact of fake news, it is crucial to develop a method to detect fake news on social media. This project involves discovering the best "state-of-art" machine learning model that can be used to detect Fake News in social media. By researching and analyzing several data sources, experimenting on the past model used and exploring new models using Transformers, this project aims to determine which models were the most optimal to classify news into their respective classes accurately. In this report, the author will review multiple data sources and applying multiple exploratory data analysis to filter out biased dataset. The author created three crucial metrics to inspect the dataset: Amount of data, credibility, and bias. By applying the above techniques and metrics, the author was able to determine the best data sources that are unbiased and fit to be trained. This report will also explore the pre-processing steps done to news articles. After research, the author found out that the level of text preprocessing needed was determined by the data domain and data amount. By implementing multiple versions of data pre-processing, the author was able to grasp the dataset domain and was able to use the most optimal data pre-processing method. Furthermore, based on this experiment, the author was also able to determine a trend or pattern, of which pairings of combinations between each machine learning algorithm and the corresponding preprocessing technique would be the best to obtain the highest accuracy. For this experiment, multiple machine learning algorithms such as Naïve Bayes, Word Embedding LSTM, and the new transformer model will be introduced. To evaluate the model's performance, the author will split the data into three sets: train, validation, and test to further mitigate the overfit and reduce bias. With accuracy as the model's main metrics, the author also had multiple metrics to support the verdict, such as F1-score, precision, recall, and MCC. These metrics will further support the author's decision in determining the best model without concern about overfitting. The experiment results reflect that the newest model developed, transformers perform the best amongst all models. The models consistently perform at the highest benchmark, ultimately surpassing the previous model developed from the range of 5 to 15%. The transformer models performed at the highest accuracy of around 87-88% consistently without overfitting and while using a standard base-parameters. The results indicate that the transformers model (particularly ELECTRA and BERT) is the best "state-of-art" machine learning model for fake news classification problems. The experiments also imply that further research and experiment can be done with a larger parameter, combining with generative upscaling and sentiment analysis, to obtain even higher performance. Bachelor of Engineering (Computer Science) 2021-04-06T08:25:37Z 2021-04-06T08:25:37Z 2021 Final Year Project (FYP) Widjaja, E. (2021). Fake news detection using social media data. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/147544 https://hdl.handle.net/10356/147544 en SCSE20-0449 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Widjaja, Elbert Fake news detection using social media data
description	Along with the large transition to the online social media market, a large number of "Fake News", i.e., articles that purposefully contain false information, are being spread across the network[1]. Fake news can be produced for many purposes, such as financial or political gain, and can have a negative impact on society. Therefore, to mitigate the negative impact of fake news, it is crucial to develop a method to detect fake news on social media. This project involves discovering the best "state-of-art" machine learning model that can be used to detect Fake News in social media. By researching and analyzing several data sources, experimenting on the past model used and exploring new models using Transformers, this project aims to determine which models were the most optimal to classify news into their respective classes accurately. In this report, the author will review multiple data sources and applying multiple exploratory data analysis to filter out biased dataset. The author created three crucial metrics to inspect the dataset: Amount of data, credibility, and bias. By applying the above techniques and metrics, the author was able to determine the best data sources that are unbiased and fit to be trained. This report will also explore the pre-processing steps done to news articles. After research, the author found out that the level of text preprocessing needed was determined by the data domain and data amount. By implementing multiple versions of data pre-processing, the author was able to grasp the dataset domain and was able to use the most optimal data pre-processing method. Furthermore, based on this experiment, the author was also able to determine a trend or pattern, of which pairings of combinations between each machine learning algorithm and the corresponding preprocessing technique would be the best to obtain the highest accuracy. For this experiment, multiple machine learning algorithms such as Naïve Bayes, Word Embedding LSTM, and the new transformer model will be introduced. To evaluate the model's performance, the author will split the data into three sets: train, validation, and test to further mitigate the overfit and reduce bias. With accuracy as the model's main metrics, the author also had multiple metrics to support the verdict, such as F1-score, precision, recall, and MCC. These metrics will further support the author's decision in determining the best model without concern about overfitting. The experiment results reflect that the newest model developed, transformers perform the best amongst all models. The models consistently perform at the highest benchmark, ultimately surpassing the previous model developed from the range of 5 to 15%. The transformer models performed at the highest accuracy of around 87-88% consistently without overfitting and while using a standard base-parameters. The results indicate that the transformers model (particularly ELECTRA and BERT) is the best "state-of-art" machine learning model for fake news classification problems. The experiments also imply that further research and experiment can be done with a larger parameter, combining with generative upscaling and sentiment analysis, to obtain even higher performance.
author2	Ke Yiping, Kelly
author_facet	Ke Yiping, Kelly Widjaja, Elbert
format	Final Year Project
author	Widjaja, Elbert
author_sort	Widjaja, Elbert
title	Fake news detection using social media data
title_short	Fake news detection using social media data
title_full	Fake news detection using social media data
title_fullStr	Fake news detection using social media data
title_full_unstemmed	Fake news detection using social media data
title_sort	fake news detection using social media data
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/147544
_version_	1696984354442444800

Fake news detection using social media data

Similar Items