Job scam detection using classification algorithms

Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easi...

Full description

Saved in:
Bibliographic Details
Main Author: Sim, Keith Shi Jie
Other Authors: Josephine Chong Leng Leng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181115
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181115
record_format dspace
spelling sg-ntu-dr.10356-1811152024-11-14T12:31:38Z Job scam detection using classification algorithms Sim, Keith Shi Jie Josephine Chong Leng Leng College of Computing and Data Science josephine.chong@ntu.edu.sg Computer and Information Science Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easily collect up to 1000 resumes a day. The objective of this study is to expand upon the foundational knowledge obtained by past researchers and identify feature extraction techniques and classification models that are most effective in identifying fake job advertisements. This study applies modern Natural Language Processing (NLP) techniques such as transformers and word embeddings on the Employment Scam Aegean Dataset (EMSCAD) from the University of the Aegean to study its effectiveness. The resulting models that utilised these techniques managed to achieve the highest F1 scores through the study, highlighting their effectiveness in the classification task. These results support prior research and prove that feature selection improves performance regardless of the classification model chosen. Additionally, embedding features generally perform better than a custom ruleset of features. Although these results show that transformers and word embeddings are effective, they are prone to certain limitations due to the imbalanced EMSCAD dataset, and the maximum sequence length of the transformer models used in this study. Hence, future work in this area can focus on creating a more robust, comprehensive and balanced dataset as compared to the EMSCAD dataset and focus on fine-tuning other transformer models such as BigBird and Longformer, that are capable of handling larger sequences of texts. Bachelor's degree 2024-11-14T12:31:38Z 2024-11-14T12:31:38Z 2024 Final Year Project (FYP) Sim, K. S. J. (2024). Job scam detection using classification algorithms. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181115 https://hdl.handle.net/10356/181115 en SCSE23-0928 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Sim, Keith Shi Jie
Job scam detection using classification algorithms
description Scams are the most common type of cybercrime in Singapore, with a majority of them being job scams. Applicant Tracking Systems (ATS) and their automation capabilities makes it easy for scammers to post fraudulent job listings on online recruitment portals such as Monster. It also allows them to easily collect up to 1000 resumes a day. The objective of this study is to expand upon the foundational knowledge obtained by past researchers and identify feature extraction techniques and classification models that are most effective in identifying fake job advertisements. This study applies modern Natural Language Processing (NLP) techniques such as transformers and word embeddings on the Employment Scam Aegean Dataset (EMSCAD) from the University of the Aegean to study its effectiveness. The resulting models that utilised these techniques managed to achieve the highest F1 scores through the study, highlighting their effectiveness in the classification task. These results support prior research and prove that feature selection improves performance regardless of the classification model chosen. Additionally, embedding features generally perform better than a custom ruleset of features. Although these results show that transformers and word embeddings are effective, they are prone to certain limitations due to the imbalanced EMSCAD dataset, and the maximum sequence length of the transformer models used in this study. Hence, future work in this area can focus on creating a more robust, comprehensive and balanced dataset as compared to the EMSCAD dataset and focus on fine-tuning other transformer models such as BigBird and Longformer, that are capable of handling larger sequences of texts.
author2 Josephine Chong Leng Leng
author_facet Josephine Chong Leng Leng
Sim, Keith Shi Jie
format Final Year Project
author Sim, Keith Shi Jie
author_sort Sim, Keith Shi Jie
title Job scam detection using classification algorithms
title_short Job scam detection using classification algorithms
title_full Job scam detection using classification algorithms
title_fullStr Job scam detection using classification algorithms
title_full_unstemmed Job scam detection using classification algorithms
title_sort job scam detection using classification algorithms
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181115
_version_ 1816859054081835008