Fake news detection using social media data
Along with the large transition to the online social media market, a large number of "Fake News", i.e., articles that purposefully contain false information, are being spread across the network[1]. Fake news can be produced for many purposes, such as financial or political gain, and can...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/147544 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-147544 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1475442021-04-06T08:25:37Z Fake news detection using social media data Widjaja, Elbert Ke Yiping, Kelly School of Computer Science and Engineering ypke@ntu.edu.sg Engineering::Computer science and engineering Along with the large transition to the online social media market, a large number of "Fake News", i.e., articles that purposefully contain false information, are being spread across the network[1]. Fake news can be produced for many purposes, such as financial or political gain, and can have a negative impact on society. Therefore, to mitigate the negative impact of fake news, it is crucial to develop a method to detect fake news on social media. This project involves discovering the best "state-of-art" machine learning model that can be used to detect Fake News in social media. By researching and analyzing several data sources, experimenting on the past model used and exploring new models using Transformers, this project aims to determine which models were the most optimal to classify news into their respective classes accurately. In this report, the author will review multiple data sources and applying multiple exploratory data analysis to filter out biased dataset. The author created three crucial metrics to inspect the dataset: Amount of data, credibility, and bias. By applying the above techniques and metrics, the author was able to determine the best data sources that are unbiased and fit to be trained. This report will also explore the pre-processing steps done to news articles. After research, the author found out that the level of text preprocessing needed was determined by the data domain and data amount. By implementing multiple versions of data pre-processing, the author was able to grasp the dataset domain and was able to use the most optimal data pre-processing method. Furthermore, based on this experiment, the author was also able to determine a trend or pattern, of which pairings of combinations between each machine learning algorithm and the corresponding preprocessing technique would be the best to obtain the highest accuracy. For this experiment, multiple machine learning algorithms such as Naïve Bayes, Word Embedding LSTM, and the new transformer model will be introduced. To evaluate the model's performance, the author will split the data into three sets: train, validation, and test to further mitigate the overfit and reduce bias. With accuracy as the model's main metrics, the author also had multiple metrics to support the verdict, such as F1-score, precision, recall, and MCC. These metrics will further support the author's decision in determining the best model without concern about overfitting. The experiment results reflect that the newest model developed, transformers perform the best amongst all models. The models consistently perform at the highest benchmark, ultimately surpassing the previous model developed from the range of 5 to 15%. The transformer models performed at the highest accuracy of around 87-88% consistently without overfitting and while using a standard base-parameters. The results indicate that the transformers model (particularly ELECTRA and BERT) is the best "state-of-art" machine learning model for fake news classification problems. The experiments also imply that further research and experiment can be done with a larger parameter, combining with generative upscaling and sentiment analysis, to obtain even higher performance. Bachelor of Engineering (Computer Science) 2021-04-06T08:25:37Z 2021-04-06T08:25:37Z 2021 Final Year Project (FYP) Widjaja, E. (2021). Fake news detection using social media data. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/147544 https://hdl.handle.net/10356/147544 en SCSE20-0449 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Widjaja, Elbert Fake news detection using social media data |
description |
Along with the large transition to the online social media market, a large number of
"Fake News", i.e., articles that purposefully contain false information, are being spread
across the network[1]. Fake news can be produced for many purposes, such as financial
or political gain, and can have a negative impact on society. Therefore, to mitigate the
negative impact of fake news, it is crucial to develop a method to detect fake news on
social media.
This project involves discovering the best "state-of-art" machine learning model that
can be used to detect Fake News in social media. By researching and analyzing several
data sources, experimenting on the past model used and exploring new models using
Transformers, this project aims to determine which models were the most optimal to
classify news into their respective classes accurately.
In this report, the author will review multiple data sources and applying multiple
exploratory data analysis to filter out biased dataset. The author created three crucial
metrics to inspect the dataset: Amount of data, credibility, and bias. By applying the
above techniques and metrics, the author was able to determine the best data sources
that are unbiased and fit to be trained.
This report will also explore the pre-processing steps done to news articles. After
research, the author found out that the level of text preprocessing needed was
determined by the data domain and data amount. By implementing multiple versions
of data pre-processing, the author was able to grasp the dataset domain and was able
to use the most optimal data pre-processing method. Furthermore, based on this
experiment, the author was also able to determine a trend or pattern, of which pairings
of combinations between each machine learning algorithm and the corresponding preprocessing
technique would be the best to obtain the highest accuracy.
For this experiment, multiple machine learning algorithms such as Naïve Bayes, Word
Embedding LSTM, and the new transformer model will be introduced. To evaluate the
model's performance, the author will split the data into three sets: train, validation, and
test to further mitigate the overfit and reduce bias. With accuracy as the model's main
metrics, the author also had multiple metrics to support the verdict, such as F1-score,
precision, recall, and MCC. These metrics will further support the author's decision in
determining the best model without concern about overfitting.
The experiment results reflect that the newest model developed, transformers
perform the best amongst all models. The models consistently perform at the highest
benchmark, ultimately surpassing the previous model developed from the range of 5
to 15%. The transformer models performed at the highest accuracy of around 87-88%
consistently without overfitting and while using a standard base-parameters. The
results indicate that the transformers model (particularly ELECTRA and BERT)
is the best "state-of-art" machine learning model for fake news classification
problems. The experiments also imply that further research and experiment can be
done with a larger parameter, combining with generative upscaling and sentiment
analysis, to obtain even higher performance. |
author2 |
Ke Yiping, Kelly |
author_facet |
Ke Yiping, Kelly Widjaja, Elbert |
format |
Final Year Project |
author |
Widjaja, Elbert |
author_sort |
Widjaja, Elbert |
title |
Fake news detection using social media data |
title_short |
Fake news detection using social media data |
title_full |
Fake news detection using social media data |
title_fullStr |
Fake news detection using social media data |
title_full_unstemmed |
Fake news detection using social media data |
title_sort |
fake news detection using social media data |
publisher |
Nanyang Technological University |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/147544 |
_version_ |
1696984354442444800 |