Machine learning for email filtering and categorising

As digitalisation persists, email has become the primary communication channel for personal and business users. This project focuses on three Natural Language Processing (NLP) tasks: 1) Spam Filtering, 2) Categorising, and 3) Summarising. For each task, it is using the Enron spam dataset, AG news da...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Kai Qin
Format:	Final Year Project / Dissertation / Thesis
Published:	2023
Subjects:	HG Finance
Online Access:	http://eprints.utar.edu.my/6154/1/TAN_KAI_QIN%2D1906282.pdf http://eprints.utar.edu.my/6154/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Tunku Abdul Rahman

id	my-utar-eprints.6154
record_format	eprints
spelling	my-utar-eprints.61542023-12-12T08:24:18Z Machine learning for email filtering and categorising Tan, Kai Qin HG Finance As digitalisation persists, email has become the primary communication channel for personal and business users. This project focuses on three Natural Language Processing (NLP) tasks: 1) Spam Filtering, 2) Categorising, and 3) Summarising. For each task, it is using the Enron spam dataset, AG news dataset, and XSum dataset, respectively. Owing to the unprecedented growth in email transactions, businesses generally require an automated email management system to manage their mailbox, including applications in customer service and internal email. This project encompasses the classical machine learning method, conventional neural networks, and transformers for the tasks. For instance, a comparison is made for each task, and the model with the highest accuracy and F1 score is selected. Regarding the best performing model, they are Long Short-Term Memory (LSTM), Bi-directional LSTM, and PEGASUS for spam filtering, categorising, and summarising, respectively. Both LSTM and Bi-LSTM achieved the highest accuracy on the filtering and categorising tasks, with 99% and 92%, respectively. Similarly, the PEGASUS transformer has leveraged the summary similarity score by about 15% higher in all categories than the conventional neural network. The comparison concludes that limitations on training and machine specification will affect transformer’s performance in categorisation work. Conventional neural networks have the upper hand in text categorisation under the limitations, but transformers showed better resilience in summarisation owing to its unique training method. Interestingly, the neural network and transformer could not differentiate the similarities between different categories resulting in slightly lower accuracy. Furthermore, this project also presents a web-based interface for the three tasks to demonstrate the feasibility of the selected model in each designated task. 2023 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/6154/1/TAN_KAI_QIN%2D1906282.pdf Tan, Kai Qin (2023) Machine learning for email filtering and categorising. Final Year Project, UTAR. http://eprints.utar.edu.my/6154/
institution	Universiti Tunku Abdul Rahman
building	UTAR Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Tunku Abdul Rahman
content_source	UTAR Institutional Repository
url_provider	http://eprints.utar.edu.my
topic	HG Finance
spellingShingle	HG Finance Tan, Kai Qin Machine learning for email filtering and categorising
description	As digitalisation persists, email has become the primary communication channel for personal and business users. This project focuses on three Natural Language Processing (NLP) tasks: 1) Spam Filtering, 2) Categorising, and 3) Summarising. For each task, it is using the Enron spam dataset, AG news dataset, and XSum dataset, respectively. Owing to the unprecedented growth in email transactions, businesses generally require an automated email management system to manage their mailbox, including applications in customer service and internal email. This project encompasses the classical machine learning method, conventional neural networks, and transformers for the tasks. For instance, a comparison is made for each task, and the model with the highest accuracy and F1 score is selected. Regarding the best performing model, they are Long Short-Term Memory (LSTM), Bi-directional LSTM, and PEGASUS for spam filtering, categorising, and summarising, respectively. Both LSTM and Bi-LSTM achieved the highest accuracy on the filtering and categorising tasks, with 99% and 92%, respectively. Similarly, the PEGASUS transformer has leveraged the summary similarity score by about 15% higher in all categories than the conventional neural network. The comparison concludes that limitations on training and machine specification will affect transformer’s performance in categorisation work. Conventional neural networks have the upper hand in text categorisation under the limitations, but transformers showed better resilience in summarisation owing to its unique training method. Interestingly, the neural network and transformer could not differentiate the similarities between different categories resulting in slightly lower accuracy. Furthermore, this project also presents a web-based interface for the three tasks to demonstrate the feasibility of the selected model in each designated task.
format	Final Year Project / Dissertation / Thesis
author	Tan, Kai Qin
author_facet	Tan, Kai Qin
author_sort	Tan, Kai Qin
title	Machine learning for email filtering and categorising
title_short	Machine learning for email filtering and categorising
title_full	Machine learning for email filtering and categorising
title_fullStr	Machine learning for email filtering and categorising
title_full_unstemmed	Machine learning for email filtering and categorising
title_sort	machine learning for email filtering and categorising
publishDate	2023
url	http://eprints.utar.edu.my/6154/1/TAN_KAI_QIN%2D1906282.pdf http://eprints.utar.edu.my/6154/
_version_	1787140958558617600

Machine learning for email filtering and categorising

Similar Items